You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by yu shen <sh...@gmail.com> on 2012/01/09 02:44:16 UTC

Doing url search in solr is slow

Hi,

My solr document has up to 20 fields, containing data from product name,
date, url etc.

The volume of documents is around 1.5m.

My symptom is when doing url search like [ url:*www.someurl.com*
referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
extraordinary long response time, while search against all other fields,
the response time will be normal.

Can anyone share any insights on this?

Spark

Re: Doing url search in solr is slow

Posted by yu shen <sh...@gmail.com>.
Hi Erick,

I change all my url fields into text (they were string fields before), and
added a WordDelimiterFilterFactory, so that url fields can be tokenized
into several words. But I still got around 15 seconds response time
measured using debugyQuery=on, and most of the time still spend on
DebugComponent. The query I use did not have any prepended asterisk.
(Excuse me if the context description is still not complete enought)

Is there any other margin to boost the query performance?

Spark

2012/1/10 yu shen <sh...@gmail.com>

> Hi Erick,
>
> I only added debugyQuery=on to the url, and did not do any configuration
> with regard to DebugComponent. Seems like 'string' type should be
> substituted with 'text' type.
>
> I will paste the result here after I did some experiments.
>
> Spark
>
>
> 2012/1/9 Erick Erickson <er...@gmail.com>
>
>> Do you by chance have the debugQuery on by default?
>> Because if you look down in the "timing" section,
>> you can see the times the various components took to do
>> their work, there are two sections "prepare" and "process".
>>
>> The cumulative time is 17.156 seconds. Of which 17.156
>> seconds is reported to be in the DebugComponent.....
>>
>> So what happens if you just turn that component off? Because
>> I don't see anything in your output that really looks like it is
>> taking any time. Of course if you've changed your code from
>> *url* to url*, that will account for time too, since the infix  case
>> requires that every term in the fields in question be examined.
>>
>> About WordDelimiterFilterFactory That is irrelevant for a "string"
>> type. It's an oen question whether a string type is what you
>> want, but that is determined by your problem space. You might
>> spend some time with admin/analysis to see the effects of
>> various analysis chains. "string" is used when you want no
>> tokenization, no case transformations etc.
>>
>> Best
>> Erick
>>
>> On Mon, Jan 9, 2012 at 10:04 AM, yu shen <sh...@gmail.com> wrote:
>> > Hi Erick,
>> >
>> > Thanks for you reply. Actually I did the following search:
>> > survey_url:http\://www.someurl.com/sch/i.html* referal_url:http\://
>> > www.someurl.com/sch/i.html* page_url:http\://
>> www.someurl.com/sch/i.html*
>> >
>> > I did not prepend any asterisk to the field value, but only append to
>> them.
>> >
>> > I analyze url field on solr admin page, and it give me this, meaning the
>> > url is not tokenized. I notice you mentioned a
>> WordDelimiterFilterFactory.
>> > Do I need to configure it in schema.xml or some place else?
>> > term position 1 term text http://www.someurl.com/sch/i.html* term type
>> > word source
>> > start,end 0,31
>> > I add the debugQuery=on to the query url, I got this (Sorry to paste
>> such
>> > long encrypted code here, they are really mysterious to me)
>> > <lst name="debug">
>> >    <str name="rawquerystring">survey_url:http\://
>> > www.someurl.com/sch/i.html*
>> > referal_url:http\://www.someurl.com/sch/i.html*page_url:http\://<http://www.someurl.com/sch/i.html*page_url:http%5C://>
>> > www.someurl.com/sch/i.html*</str>
>> >    <str name="querystring">survey_url:http\://
>> www.someurl.com/sch/i.html*referal_url:http\://<http://www.someurl.com/sch/i.html*referal_url:http%5C://>
>> > www.someurl.com/sch/i.html* page_url:http\://
>> www.someurl.com/sch/i.html*
>> > </str>
>> >    <str name="parsedquery">survey_url:
>> http://www.someurl.com/sch/i.html*referal_url:
>> > http://www.someurl.com/sch/i.html* page_url:
>> > http://www.someurl.com/sch/i.html*</str>
>> >    <str name="parsedquery_toString">survey_url:
>> > http://www.someurl.com/sch/i.html* referal_url:
>> > http://www.someurl.com/sch/i.html* page_url:
>> > http://www.someurl.com/sch/i.html*</str>
>> >    <lst name="explain">
>> >        <str name="5007688343">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007648909">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007653989">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007709065">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> >        </str>
>> >        <str name="5007710379">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007739634">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007753066">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007756045">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007832978">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str><str name="5007849124">
>> > 0.76980036 = (MATCH) product of:
>> >  1.1547005 = (MATCH) sum of:
>> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
>> > http://www.someurl.com/sch/i.html*), product of:
>> >      1.0 = boost
>> >      0.57735026 = queryNorm
>> >  0.6666667 = coord(2/3)
>> > </str></lst><str name="QParser">LuceneQParser</str><lst
>> > name="timing"><double name="time">17156.0</double><lst
>> > name="prepare"><double name="time">0.0</double><lst
>> > name="org.apache.solr.handler.component.QueryComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.FacetComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.HighlightComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.StatsComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.DebugComponent"><double
>> > name="time">0.0</double></lst></lst><lst name="process"><double
>> > name="time">17156.0</double><lst
>> > name="org.apache.solr.handler.component.QueryComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.FacetComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.HighlightComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.StatsComponent"><double
>> > name="time">0.0</double></lst><lst
>> > name="org.apache.solr.handler.component.DebugComponent"><double
>> > name="time">17156.0</double></lst></lst></lst></lst>
>> >
>> >
>> >
>> > 2012/1/9 Erick Erickson <er...@gmail.com>
>> >
>> >> Yu Shen & Arian:
>> >>
>> >> We can't help much without more information. In particular, how are
>> >> the fields in question analyzed? What is the result of looking
>> >> at the admin/analysis page? What do you get when you
>> >> attach &debugQuery=on to the query?
>> >>
>> >> You might review:
>> >> http://wiki.apache.org/solr/UsingMailingLists
>> >>
>> >> But at a wild guess, you have something like WordDelimiterFilterFactory
>> >> in your analysis chain, and it's splitting up your input into
>> >> "www" "someurl" "com" as separate tokens, and www matches
>> >> all documents so Solr is having to score all documents in your corpus,
>> but
>> >> that's just a guess. See the admin/schema browser page and find the
>> most
>> >> frequent terms for the field in question, that should indicate whether
>> >> you have some tokens that appear in all docs. Try searching on
>> >> plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> 2012/1/9 François Schiettecatte <fs...@gmail.com>:
>> >> > About the search 'referal_url:*www.someurl.com*', having a wildcard
>> at
>> >> the start will cause a dictionary scan for every term you search on
>> unless
>> >> you use ReversedWildcardFilterFactory. That could be the cause of your
>> >> slowdown if you are I/O bound, and even if you are CPU bound for that
>> >> matter.
>> >> >
>> >> > François
>> >> >
>> >> >
>> >> > On Jan 8, 2012, at 8:44 PM, yu shen wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> My solr document has up to 20 fields, containing data from product
>> name,
>> >> >> date, url etc.
>> >> >>
>> >> >> The volume of documents is around 1.5m.
>> >> >>
>> >> >> My symptom is when doing url search like [ url:*www.someurl.com*
>> >> >> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get
>> a
>> >> >> extraordinary long response time, while search against all other
>> fields,
>> >> >> the response time will be normal.
>> >> >>
>> >> >> Can anyone share any insights on this?
>> >> >>
>> >> >> Spark
>> >> >
>> >>
>>
>
>

Re: Doing url search in solr is slow

Posted by yu shen <sh...@gmail.com>.
Hi Erick,

I only added debugyQuery=on to the url, and did not do any configuration
with regard to DebugComponent. Seems like 'string' type should be
substituted with 'text' type.

I will paste the result here after I did some experiments.

Spark

2012/1/9 Erick Erickson <er...@gmail.com>

> Do you by chance have the debugQuery on by default?
> Because if you look down in the "timing" section,
> you can see the times the various components took to do
> their work, there are two sections "prepare" and "process".
>
> The cumulative time is 17.156 seconds. Of which 17.156
> seconds is reported to be in the DebugComponent.....
>
> So what happens if you just turn that component off? Because
> I don't see anything in your output that really looks like it is
> taking any time. Of course if you've changed your code from
> *url* to url*, that will account for time too, since the infix  case
> requires that every term in the fields in question be examined.
>
> About WordDelimiterFilterFactory That is irrelevant for a "string"
> type. It's an oen question whether a string type is what you
> want, but that is determined by your problem space. You might
> spend some time with admin/analysis to see the effects of
> various analysis chains. "string" is used when you want no
> tokenization, no case transformations etc.
>
> Best
> Erick
>
> On Mon, Jan 9, 2012 at 10:04 AM, yu shen <sh...@gmail.com> wrote:
> > Hi Erick,
> >
> > Thanks for you reply. Actually I did the following search:
> > survey_url:http\://www.someurl.com/sch/i.html* referal_url:http\://
> > www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*
> >
> > I did not prepend any asterisk to the field value, but only append to
> them.
> >
> > I analyze url field on solr admin page, and it give me this, meaning the
> > url is not tokenized. I notice you mentioned a
> WordDelimiterFilterFactory.
> > Do I need to configure it in schema.xml or some place else?
> > term position 1 term text http://www.someurl.com/sch/i.html* term type
> > word source
> > start,end 0,31
> > I add the debugQuery=on to the query url, I got this (Sorry to paste such
> > long encrypted code here, they are really mysterious to me)
> > <lst name="debug">
> >    <str name="rawquerystring">survey_url:http\://
> > www.someurl.com/sch/i.html*
> > referal_url:http\://www.someurl.com/sch/i.html*page_url:http\://<http://www.someurl.com/sch/i.html*page_url:http%5C://>
> > www.someurl.com/sch/i.html*</str>
> >    <str name="querystring">survey_url:http\://
> www.someurl.com/sch/i.html*referal_url:http\://<http://www.someurl.com/sch/i.html*referal_url:http%5C://>
> > www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*
> > </str>
> >    <str name="parsedquery">survey_url:
> http://www.someurl.com/sch/i.html*referal_url:
> > http://www.someurl.com/sch/i.html* page_url:
> > http://www.someurl.com/sch/i.html*</str>
> >    <str name="parsedquery_toString">survey_url:
> > http://www.someurl.com/sch/i.html* referal_url:
> > http://www.someurl.com/sch/i.html* page_url:
> > http://www.someurl.com/sch/i.html*</str>
> >    <lst name="explain">
> >        <str name="5007688343">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007648909">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007653989">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007709065">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> >        </str>
> >        <str name="5007710379">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007739634">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007753066">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007756045">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007832978">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str><str name="5007849124">
> > 0.76980036 = (MATCH) product of:
> >  1.1547005 = (MATCH) sum of:
> >    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> > http://www.someurl.com/sch/i.html*), product of:
> >      1.0 = boost
> >      0.57735026 = queryNorm
> >  0.6666667 = coord(2/3)
> > </str></lst><str name="QParser">LuceneQParser</str><lst
> > name="timing"><double name="time">17156.0</double><lst
> > name="prepare"><double name="time">0.0</double><lst
> > name="org.apache.solr.handler.component.QueryComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.FacetComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.HighlightComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.StatsComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.DebugComponent"><double
> > name="time">0.0</double></lst></lst><lst name="process"><double
> > name="time">17156.0</double><lst
> > name="org.apache.solr.handler.component.QueryComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.FacetComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.HighlightComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.StatsComponent"><double
> > name="time">0.0</double></lst><lst
> > name="org.apache.solr.handler.component.DebugComponent"><double
> > name="time">17156.0</double></lst></lst></lst></lst>
> >
> >
> >
> > 2012/1/9 Erick Erickson <er...@gmail.com>
> >
> >> Yu Shen & Arian:
> >>
> >> We can't help much without more information. In particular, how are
> >> the fields in question analyzed? What is the result of looking
> >> at the admin/analysis page? What do you get when you
> >> attach &debugQuery=on to the query?
> >>
> >> You might review:
> >> http://wiki.apache.org/solr/UsingMailingLists
> >>
> >> But at a wild guess, you have something like WordDelimiterFilterFactory
> >> in your analysis chain, and it's splitting up your input into
> >> "www" "someurl" "com" as separate tokens, and www matches
> >> all documents so Solr is having to score all documents in your corpus,
> but
> >> that's just a guess. See the admin/schema browser page and find the most
> >> frequent terms for the field in question, that should indicate whether
> >> you have some tokens that appear in all docs. Try searching on
> >> plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever.
> >>
> >> Best
> >> Erick
> >>
> >> 2012/1/9 François Schiettecatte <fs...@gmail.com>:
> >> > About the search 'referal_url:*www.someurl.com*', having a wildcard
> at
> >> the start will cause a dictionary scan for every term you search on
> unless
> >> you use ReversedWildcardFilterFactory. That could be the cause of your
> >> slowdown if you are I/O bound, and even if you are CPU bound for that
> >> matter.
> >> >
> >> > François
> >> >
> >> >
> >> > On Jan 8, 2012, at 8:44 PM, yu shen wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> My solr document has up to 20 fields, containing data from product
> name,
> >> >> date, url etc.
> >> >>
> >> >> The volume of documents is around 1.5m.
> >> >>
> >> >> My symptom is when doing url search like [ url:*www.someurl.com*
> >> >> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
> >> >> extraordinary long response time, while search against all other
> fields,
> >> >> the response time will be normal.
> >> >>
> >> >> Can anyone share any insights on this?
> >> >>
> >> >> Spark
> >> >
> >>
>

Re: Doing url search in solr is slow

Posted by Erick Erickson <er...@gmail.com>.
Do you by chance have the debugQuery on by default?
Because if you look down in the "timing" section,
you can see the times the various components took to do
their work, there are two sections "prepare" and "process".

The cumulative time is 17.156 seconds. Of which 17.156
seconds is reported to be in the DebugComponent.....

So what happens if you just turn that component off? Because
I don't see anything in your output that really looks like it is
taking any time. Of course if you've changed your code from
*url* to url*, that will account for time too, since the infix  case
requires that every term in the fields in question be examined.

About WordDelimiterFilterFactory That is irrelevant for a "string"
type. It's an oen question whether a string type is what you
want, but that is determined by your problem space. You might
spend some time with admin/analysis to see the effects of
various analysis chains. "string" is used when you want no
tokenization, no case transformations etc.

Best
Erick

On Mon, Jan 9, 2012 at 10:04 AM, yu shen <sh...@gmail.com> wrote:
> Hi Erick,
>
> Thanks for you reply. Actually I did the following search:
> survey_url:http\://www.someurl.com/sch/i.html* referal_url:http\://
> www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*
>
> I did not prepend any asterisk to the field value, but only append to them.
>
> I analyze url field on solr admin page, and it give me this, meaning the
> url is not tokenized. I notice you mentioned a WordDelimiterFilterFactory.
> Do I need to configure it in schema.xml or some place else?
> term position 1 term text http://www.someurl.com/sch/i.html* term type
> word source
> start,end 0,31
> I add the debugQuery=on to the query url, I got this (Sorry to paste such
> long encrypted code here, they are really mysterious to me)
> <lst name="debug">
>    <str name="rawquerystring">survey_url:http\://
> www.someurl.com/sch/i.html*
> referal_url:http\://www.someurl.com/sch/i.html*page_url:http\://
> www.someurl.com/sch/i.html*</str>
>    <str name="querystring">survey_url:http\://www.someurl.com/sch/i.html*referal_url:http\://
> www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*
> </str>
>    <str name="parsedquery">survey_url:http://www.someurl.com/sch/i.html*referal_url:
> http://www.someurl.com/sch/i.html* page_url:
> http://www.someurl.com/sch/i.html*</str>
>    <str name="parsedquery_toString">survey_url:
> http://www.someurl.com/sch/i.html* referal_url:
> http://www.someurl.com/sch/i.html* page_url:
> http://www.someurl.com/sch/i.html*</str>
>    <lst name="explain">
>        <str name="5007688343">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
>        </str>
>        <str name="5007648909">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
>        </str>
>        <str name="5007653989">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
>        </str>
>        <str name="5007709065">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
>        </str>
>        <str name="5007710379">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
> </str><str name="5007739634">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
> </str><str name="5007753066">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
> </str><str name="5007756045">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
> </str><str name="5007832978">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
> </str><str name="5007849124">
> 0.76980036 = (MATCH) product of:
>  1.1547005 = (MATCH) sum of:
>    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
> http://www.someurl.com/sch/i.html*), product of:
>      1.0 = boost
>      0.57735026 = queryNorm
>  0.6666667 = coord(2/3)
> </str></lst><str name="QParser">LuceneQParser</str><lst
> name="timing"><double name="time">17156.0</double><lst
> name="prepare"><double name="time">0.0</double><lst
> name="org.apache.solr.handler.component.QueryComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.FacetComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.HighlightComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.StatsComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.DebugComponent"><double
> name="time">0.0</double></lst></lst><lst name="process"><double
> name="time">17156.0</double><lst
> name="org.apache.solr.handler.component.QueryComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.FacetComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.HighlightComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.StatsComponent"><double
> name="time">0.0</double></lst><lst
> name="org.apache.solr.handler.component.DebugComponent"><double
> name="time">17156.0</double></lst></lst></lst></lst>
>
>
>
> 2012/1/9 Erick Erickson <er...@gmail.com>
>
>> Yu Shen & Arian:
>>
>> We can't help much without more information. In particular, how are
>> the fields in question analyzed? What is the result of looking
>> at the admin/analysis page? What do you get when you
>> attach &debugQuery=on to the query?
>>
>> You might review:
>> http://wiki.apache.org/solr/UsingMailingLists
>>
>> But at a wild guess, you have something like WordDelimiterFilterFactory
>> in your analysis chain, and it's splitting up your input into
>> "www" "someurl" "com" as separate tokens, and www matches
>> all documents so Solr is having to score all documents in your corpus, but
>> that's just a guess. See the admin/schema browser page and find the most
>> frequent terms for the field in question, that should indicate whether
>> you have some tokens that appear in all docs. Try searching on
>> plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever.
>>
>> Best
>> Erick
>>
>> 2012/1/9 François Schiettecatte <fs...@gmail.com>:
>> > About the search 'referal_url:*www.someurl.com*', having a wildcard at
>> the start will cause a dictionary scan for every term you search on unless
>> you use ReversedWildcardFilterFactory. That could be the cause of your
>> slowdown if you are I/O bound, and even if you are CPU bound for that
>> matter.
>> >
>> > François
>> >
>> >
>> > On Jan 8, 2012, at 8:44 PM, yu shen wrote:
>> >
>> >> Hi,
>> >>
>> >> My solr document has up to 20 fields, containing data from product name,
>> >> date, url etc.
>> >>
>> >> The volume of documents is around 1.5m.
>> >>
>> >> My symptom is when doing url search like [ url:*www.someurl.com*
>> >> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
>> >> extraordinary long response time, while search against all other fields,
>> >> the response time will be normal.
>> >>
>> >> Can anyone share any insights on this?
>> >>
>> >> Spark
>> >
>>

Re: Doing url search in solr is slow

Posted by yu shen <sh...@gmail.com>.
Hi Erick,

Thanks for you reply. Actually I did the following search:
survey_url:http\://www.someurl.com/sch/i.html* referal_url:http\://
www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*

I did not prepend any asterisk to the field value, but only append to them.

I analyze url field on solr admin page, and it give me this, meaning the
url is not tokenized. I notice you mentioned a WordDelimiterFilterFactory.
Do I need to configure it in schema.xml or some place else?
term position 1 term text http://www.someurl.com/sch/i.html* term type
word source
start,end 0,31
I add the debugQuery=on to the query url, I got this (Sorry to paste such
long encrypted code here, they are really mysterious to me)
<lst name="debug">
    <str name="rawquerystring">survey_url:http\://
www.someurl.com/sch/i.html*
referal_url:http\://www.someurl.com/sch/i.html*page_url:http\://
www.someurl.com/sch/i.html*</str>
    <str name="querystring">survey_url:http\://www.someurl.com/sch/i.html*referal_url:http\://
www.someurl.com/sch/i.html* page_url:http\://www.someurl.com/sch/i.html*
</str>
    <str name="parsedquery">survey_url:http://www.someurl.com/sch/i.html*referal_url:
http://www.someurl.com/sch/i.html* page_url:
http://www.someurl.com/sch/i.html*</str>
    <str name="parsedquery_toString">survey_url:
http://www.someurl.com/sch/i.html* referal_url:
http://www.someurl.com/sch/i.html* page_url:
http://www.someurl.com/sch/i.html*</str>
    <lst name="explain">
        <str name="5007688343">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
        </str>
        <str name="5007648909">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
        </str>
        <str name="5007653989">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
        </str>
        <str name="5007709065">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
        </str>
        <str name="5007710379">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
</str><str name="5007739634">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
</str><str name="5007753066">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
</str><str name="5007756045">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
</str><str name="5007832978">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
</str><str name="5007849124">
0.76980036 = (MATCH) product of:
  1.1547005 = (MATCH) sum of:
    0.57735026 = (MATCH) ConstantScoreQuery(referal_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
    0.57735026 = (MATCH) ConstantScoreQuery(page_url:
http://www.someurl.com/sch/i.html*), product of:
      1.0 = boost
      0.57735026 = queryNorm
  0.6666667 = coord(2/3)
</str></lst><str name="QParser">LuceneQParser</str><lst
name="timing"><double name="time">17156.0</double><lst
name="prepare"><double name="time">0.0</double><lst
name="org.apache.solr.handler.component.QueryComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.FacetComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.HighlightComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.StatsComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.DebugComponent"><double
name="time">0.0</double></lst></lst><lst name="process"><double
name="time">17156.0</double><lst
name="org.apache.solr.handler.component.QueryComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.FacetComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.HighlightComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.StatsComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.DebugComponent"><double
name="time">17156.0</double></lst></lst></lst></lst>



2012/1/9 Erick Erickson <er...@gmail.com>

> Yu Shen & Arian:
>
> We can't help much without more information. In particular, how are
> the fields in question analyzed? What is the result of looking
> at the admin/analysis page? What do you get when you
> attach &debugQuery=on to the query?
>
> You might review:
> http://wiki.apache.org/solr/UsingMailingLists
>
> But at a wild guess, you have something like WordDelimiterFilterFactory
> in your analysis chain, and it's splitting up your input into
> "www" "someurl" "com" as separate tokens, and www matches
> all documents so Solr is having to score all documents in your corpus, but
> that's just a guess. See the admin/schema browser page and find the most
> frequent terms for the field in question, that should indicate whether
> you have some tokens that appear in all docs. Try searching on
> plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever.
>
> Best
> Erick
>
> 2012/1/9 François Schiettecatte <fs...@gmail.com>:
> > About the search 'referal_url:*www.someurl.com*', having a wildcard at
> the start will cause a dictionary scan for every term you search on unless
> you use ReversedWildcardFilterFactory. That could be the cause of your
> slowdown if you are I/O bound, and even if you are CPU bound for that
> matter.
> >
> > François
> >
> >
> > On Jan 8, 2012, at 8:44 PM, yu shen wrote:
> >
> >> Hi,
> >>
> >> My solr document has up to 20 fields, containing data from product name,
> >> date, url etc.
> >>
> >> The volume of documents is around 1.5m.
> >>
> >> My symptom is when doing url search like [ url:*www.someurl.com*
> >> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
> >> extraordinary long response time, while search against all other fields,
> >> the response time will be normal.
> >>
> >> Can anyone share any insights on this?
> >>
> >> Spark
> >
>

Re: Doing url search in solr is slow

Posted by Erick Erickson <er...@gmail.com>.
Yu Shen & Arian:

We can't help much without more information. In particular, how are
the fields in question analyzed? What is the result of looking
at the admin/analysis page? What do you get when you
attach &debugQuery=on to the query?

You might review:
http://wiki.apache.org/solr/UsingMailingLists

But at a wild guess, you have something like WordDelimiterFilterFactory
in your analysis chain, and it's splitting up your input into
"www" "someurl" "com" as separate tokens, and www matches
all documents so Solr is having to score all documents in your corpus, but
that's just a guess. See the admin/schema browser page and find the most
frequent terms for the field in question, that should indicate whether
you have some tokens that appear in all docs. Try searching on
plain "someurl". Is that slow? Or "someurl.anotherpart" or whatever.

Best
Erick

2012/1/9 François Schiettecatte <fs...@gmail.com>:
> About the search 'referal_url:*www.someurl.com*', having a wildcard at the start will cause a dictionary scan for every term you search on unless you use ReversedWildcardFilterFactory. That could be the cause of your slowdown if you are I/O bound, and even if you are CPU bound for that matter.
>
> François
>
>
> On Jan 8, 2012, at 8:44 PM, yu shen wrote:
>
>> Hi,
>>
>> My solr document has up to 20 fields, containing data from product name,
>> date, url etc.
>>
>> The volume of documents is around 1.5m.
>>
>> My symptom is when doing url search like [ url:*www.someurl.com*
>> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
>> extraordinary long response time, while search against all other fields,
>> the response time will be normal.
>>
>> Can anyone share any insights on this?
>>
>> Spark
>

Re: Doing url search in solr is slow

Posted by François Schiettecatte <fs...@gmail.com>.
About the search 'referal_url:*www.someurl.com*', having a wildcard at the start will cause a dictionary scan for every term you search on unless you use ReversedWildcardFilterFactory. That could be the cause of your slowdown if you are I/O bound, and even if you are CPU bound for that matter.

François


On Jan 8, 2012, at 8:44 PM, yu shen wrote:

> Hi,
> 
> My solr document has up to 20 fields, containing data from product name,
> date, url etc.
> 
> The volume of documents is around 1.5m.
> 
> My symptom is when doing url search like [ url:*www.someurl.com*
> referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
> extraordinary long response time, while search against all other fields,
> the response time will be normal.
> 
> Can anyone share any insights on this?
> 
> Spark


Re: Doing url search in solr is slow

Posted by Arian <ar...@informant.com.br>.
I face similar problem.
Facet queries with uri fields are slower than others field types.
I don't know why.

Arian

Sent from my Kindle Fire

_____________________________________________
From: yu shen <sh...@gmail.com>
Sent: Sun Jan 08 23:44:16 GMT-03:00 2012
To: solr-user@lucene.apache.org
Subject: Doing url search in solr is slow


Hi,

My solr document has up to 20 fields, containing data from product name,
date, url etc.

The volume of documents is around 1.5m.

My symptom is when doing url search like [ url:*www.someurl.com*
referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
extraordinary long response time, while search against all other fields,
the response time will be normal.

Can anyone share any insights on this?

Spark