You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chamnap Chhorn <ch...@gmail.com> on 2012/07/02 18:35:05 UTC

How to improve this solr query?

Hi all,

I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The
problem is that my query is so slow; the average response time is 12 secs
against 13 millions documents.

What I am doing is to send quoted string (q2) to string fields and
non-quoted string (q1) to other fields and combine the result together.

facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
&tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
*
_query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
*
&facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid

I have done solr optimize already, but it's still slow. Any idea how to
improve the speed? Am I done anything wrong?

-- 
Chhorn Chamnap
http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Chamnap Chhorn <ch...@gmail.com>.

Hi Erick and Michael,

It's not asterisk at all. Sorry to confuse you guys, it's actually
*dot *letter.
I put it that way because it contains quite a lot of fields there.

The reason I'm doing that is because I have some string fields and
non-string fields. The idea is to send quoted value to string fields and
non-quoted value to non-string fields. I have to do that in order to match
string fields.

I have tried using pf, but it doesn't match the string field at all. Do you
have any good resource about how to use pf? I looked into several latest
solr books, but they said very little about it.

On Wed, Jul 4, 2012 at 3:51 AM, Erick Erickson <er...@gmail.com>wrote:

> Chamnap:
>
> I've seen various e-mail programs put the asterisk in for terms that
> are in bold face.
>
> The queries you pasted have lots of "*" characters in it, I suspect
> that they were just
> things you put in bold in your original, that may be the source of the
> confusion about
> whether you were using wildcards.........
>
> But on to your question. If your q1 and q2 are the same words,
> wouldn't it just work to
> specify the "pf" (phrase field) parameter for edismax? That
> automatically takes the terms
> in the query and turns it into a phrase query that's boosted higher.
>
> And what's the use-case here? I think hou might be making this more
> complex than
> it needs to be....
>
> Best
> Erick
>
> On Tue, Jul 3, 2012 at 8:41 AM, Michael Della Bitta
> <mi...@appinions.com> wrote:
> > Chamnap,
> >
> > I have a hunch you can get away with not using *s.
> >
> > Michael Della Bitta
> >
> > ------------------------------------------------
> > Appinions, Inc. -- Where Influence Isn’t a Game.
> > http://www.appinions.com
> >
> >
> > On Tue, Jul 3, 2012 at 2:16 AM, Chamnap Chhorn <ch...@gmail.com>
> wrote:
> >> Lance, I didn't use widcard at all. I use only this, the difference is
> >> quoted or not.
> >>
> >> q2=*"apartment"*
> >> q1=*apartment*
> >> *
> >> *
> >> On Tue, Jul 3, 2012 at 12:06 PM, Lance Norskog <go...@gmail.com>
> wrote:
> >>
> >>> &q2=*"apartment"*
> >>> q1=*apartment*
> >>>
> >>> These are wildcards
> >>>
> >>> On Mon, Jul 2, 2012 at 8:30 PM, Chamnap Chhorn <
> chamnapchhorn@gmail.com>
> >>> wrote:
> >>> > Hi Lance,
> >>> >
> >>> > I didn't use wildcards at all. This is a normal text search only. I
> need
> >>> a
> >>> > string field because it needs to be matched exactly, and the value is
> >>> > sometimes a multi-word, so quoted it is necessary.
> >>> >
> >>> > By the way, if I do a super plain query, it takes at least 600ms.
> I'm not
> >>> > sure why. On another solr instance with similar amount of data, it
> takes
> >>> > only 50ms.
> >>> >
> >>> > I see something strange on the response, there is always
> >>> >
> >>> > <str name="command">build</str>
> >>> >
> >>> > What does that mean?
> >>> >
> >>> > On Tue, Jul 3, 2012 at 10:02 AM, Lance Norskog <go...@gmail.com>
> >>> wrote:
> >>> >
> >>> >> Wildcards are slow. Leading wildcards are even more slow. Is there
> >>> >> some way to search that data differently? If it is a string, can you
> >>> >> change it to a text field and make sure 'apartment' is a separate
> >>> >> word?
> >>> >>
> >>> >> On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn <
> >>> chamnapchhorn@gmail.com>
> >>> >> wrote:
> >>> >> > Hi Michael,
> >>> >> >
> >>> >> > Thanks for quick response. Based on documentation,
> "facet.mincount"
> >>> means
> >>> >> > that solr will return facet fields that has at least that number.
> For
> >>> >> me, I
> >>> >> > just want to ensure my facet fields count doesn't have zero value.
> >>> >> >
> >>> >> > I try to increase to 10, but it still slows even for the same
> query.
> >>> >> >
> >>> >> > Actually, those 13 million documents are divided into 200
> portals. I
> >>> >> > already include "fq=portal_uuid: kjkjkjk" inside each nested
> query,
> >>> but
> >>> >> > it's still slow.
> >>> >> >
> >>> >> > On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
> >>> >> > michael.della.bitta@appinions.com> wrote:
> >>> >> >
> >>> >> >> Hi Chamnap,
> >>> >> >>
> >>> >> >> The first thing that jumped out at me was "facet.mincount=1".
> Are you
> >>> >> >> sure you need this? Increasing this number should drastically
> improve
> >>> >> >> speed.
> >>> >> >>
> >>> >> >> Michael Della Bitta
> >>> >> >>
> >>> >> >> ------------------------------------------------
> >>> >> >> Appinions, Inc. -- Where Influence Isn’t a Game.
> >>> >> >> http://www.appinions.com
> >>> >> >>
> >>> >> >>
> >>> >> >> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <
> >>> >> chamnapchhorn@gmail.com>
> >>> >> >> wrote:
> >>> >> >> > Hi all,
> >>> >> >> >
> >>> >> >> > I'm using solr 3.5 with nested query on the 4 core cpu server
> + 17
> >>> Gb.
> >>> >> >> The
> >>> >> >> > problem is that my query is so slow; the average response time
> is
> >>> 12
> >>> >> secs
> >>> >> >> > against 13 millions documents.
> >>> >> >> >
> >>> >> >> > What I am doing is to send quoted string (q2) to string fields
> and
> >>> >> >> > non-quoted string (q1) to other fields and combine the result
> >>> >> together.
> >>> >> >> >
> >>> >> >> >
> >>> >> >>
> >>> >>
> >>>
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> >>> >> >> >
> >>> >> >>
> >>> >>
> >>>
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> >>> >> >> > *
> >>> >> >> >
> >>> >> >>
> >>> >>
> >>>
> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
> >>> >> >> > *
> >>> >> >> >
> >>> >> >>
> >>> >>
> >>>
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
> >>> >> >> >
> >>> >> >> > I have done solr optimize already, but it's still slow. Any
> idea
> >>> how
> >>> >> to
> >>> >> >> > improve the speed? Am I done anything wrong?
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Chhorn Chamnap
> >>> >> >> > http://chamnap.github.com/
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Chhorn Chamnap
> >>> >> > http://chamnap.github.com/
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Lance Norskog
> >>> >> goksron@gmail.com
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Chhorn Chamnap
> >>> > http://chamnap.github.com/
> >>>
> >>>
> >>>
> >>> --
> >>> Lance Norskog
> >>> goksron@gmail.com
> >>>
> >>
> >>
> >>
> >> --
> >> Chhorn Chamnap
> >> http://chamnap.github.com/
>



-- 
Chhorn Chamnap
http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Erick Erickson <er...@gmail.com>.

Chamnap:

I've seen various e-mail programs put the asterisk in for terms that
are in bold face.

The queries you pasted have lots of "*" characters in it, I suspect
that they were just
things you put in bold in your original, that may be the source of the
confusion about
whether you were using wildcards.........

But on to your question. If your q1 and q2 are the same words,
wouldn't it just work to
specify the "pf" (phrase field) parameter for edismax? That
automatically takes the terms
in the query and turns it into a phrase query that's boosted higher.

And what's the use-case here? I think hou might be making this more complex than
it needs to be....

Best
Erick

On Tue, Jul 3, 2012 at 8:41 AM, Michael Della Bitta
<mi...@appinions.com> wrote:
> Chamnap,
>
> I have a hunch you can get away with not using *s.
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions, Inc. -- Where Influence Isn’t a Game.
> http://www.appinions.com
>
>
> On Tue, Jul 3, 2012 at 2:16 AM, Chamnap Chhorn <ch...@gmail.com> wrote:
>> Lance, I didn't use widcard at all. I use only this, the difference is
>> quoted or not.
>>
>> q2=*"apartment"*
>> q1=*apartment*
>> *
>> *
>> On Tue, Jul 3, 2012 at 12:06 PM, Lance Norskog <go...@gmail.com> wrote:
>>
>>> &q2=*"apartment"*
>>> q1=*apartment*
>>>
>>> These are wildcards
>>>
>>> On Mon, Jul 2, 2012 at 8:30 PM, Chamnap Chhorn <ch...@gmail.com>
>>> wrote:
>>> > Hi Lance,
>>> >
>>> > I didn't use wildcards at all. This is a normal text search only. I need
>>> a
>>> > string field because it needs to be matched exactly, and the value is
>>> > sometimes a multi-word, so quoted it is necessary.
>>> >
>>> > By the way, if I do a super plain query, it takes at least 600ms. I'm not
>>> > sure why. On another solr instance with similar amount of data, it takes
>>> > only 50ms.
>>> >
>>> > I see something strange on the response, there is always
>>> >
>>> > <str name="command">build</str>
>>> >
>>> > What does that mean?
>>> >
>>> > On Tue, Jul 3, 2012 at 10:02 AM, Lance Norskog <go...@gmail.com>
>>> wrote:
>>> >
>>> >> Wildcards are slow. Leading wildcards are even more slow. Is there
>>> >> some way to search that data differently? If it is a string, can you
>>> >> change it to a text field and make sure 'apartment' is a separate
>>> >> word?
>>> >>
>>> >> On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn <
>>> chamnapchhorn@gmail.com>
>>> >> wrote:
>>> >> > Hi Michael,
>>> >> >
>>> >> > Thanks for quick response. Based on documentation, "facet.mincount"
>>> means
>>> >> > that solr will return facet fields that has at least that number. For
>>> >> me, I
>>> >> > just want to ensure my facet fields count doesn't have zero value.
>>> >> >
>>> >> > I try to increase to 10, but it still slows even for the same query.
>>> >> >
>>> >> > Actually, those 13 million documents are divided into 200 portals. I
>>> >> > already include "fq=portal_uuid: kjkjkjk" inside each nested query,
>>> but
>>> >> > it's still slow.
>>> >> >
>>> >> > On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
>>> >> > michael.della.bitta@appinions.com> wrote:
>>> >> >
>>> >> >> Hi Chamnap,
>>> >> >>
>>> >> >> The first thing that jumped out at me was "facet.mincount=1". Are you
>>> >> >> sure you need this? Increasing this number should drastically improve
>>> >> >> speed.
>>> >> >>
>>> >> >> Michael Della Bitta
>>> >> >>
>>> >> >> ------------------------------------------------
>>> >> >> Appinions, Inc. -- Where Influence Isn’t a Game.
>>> >> >> http://www.appinions.com
>>> >> >>
>>> >> >>
>>> >> >> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <
>>> >> chamnapchhorn@gmail.com>
>>> >> >> wrote:
>>> >> >> > Hi all,
>>> >> >> >
>>> >> >> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17
>>> Gb.
>>> >> >> The
>>> >> >> > problem is that my query is so slow; the average response time is
>>> 12
>>> >> secs
>>> >> >> > against 13 millions documents.
>>> >> >> >
>>> >> >> > What I am doing is to send quoted string (q2) to string fields and
>>> >> >> > non-quoted string (q1) to other fields and combine the result
>>> >> together.
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >>
>>> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
>>> >> >> >
>>> >> >>
>>> >>
>>> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
>>> >> >> > *
>>> >> >> >
>>> >> >>
>>> >>
>>> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
>>> >> >> > *
>>> >> >> >
>>> >> >>
>>> >>
>>> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
>>> >> >> >
>>> >> >> > I have done solr optimize already, but it's still slow. Any idea
>>> how
>>> >> to
>>> >> >> > improve the speed? Am I done anything wrong?
>>> >> >> >
>>> >> >> > --
>>> >> >> > Chhorn Chamnap
>>> >> >> > http://chamnap.github.com/
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Chhorn Chamnap
>>> >> > http://chamnap.github.com/
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Lance Norskog
>>> >> goksron@gmail.com
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Chhorn Chamnap
>>> > http://chamnap.github.com/
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>
>>
>>
>> --
>> Chhorn Chamnap
>> http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Michael Della Bitta <mi...@appinions.com>.

Chamnap,

I have a hunch you can get away with not using *s.

Michael Della Bitta

------------------------------------------------
Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, Jul 3, 2012 at 2:16 AM, Chamnap Chhorn <ch...@gmail.com> wrote:
> Lance, I didn't use widcard at all. I use only this, the difference is
> quoted or not.
>
> q2=*"apartment"*
> q1=*apartment*
> *
> *
> On Tue, Jul 3, 2012 at 12:06 PM, Lance Norskog <go...@gmail.com> wrote:
>
>> &q2=*"apartment"*
>> q1=*apartment*
>>
>> These are wildcards
>>
>> On Mon, Jul 2, 2012 at 8:30 PM, Chamnap Chhorn <ch...@gmail.com>
>> wrote:
>> > Hi Lance,
>> >
>> > I didn't use wildcards at all. This is a normal text search only. I need
>> a
>> > string field because it needs to be matched exactly, and the value is
>> > sometimes a multi-word, so quoted it is necessary.
>> >
>> > By the way, if I do a super plain query, it takes at least 600ms. I'm not
>> > sure why. On another solr instance with similar amount of data, it takes
>> > only 50ms.
>> >
>> > I see something strange on the response, there is always
>> >
>> > <str name="command">build</str>
>> >
>> > What does that mean?
>> >
>> > On Tue, Jul 3, 2012 at 10:02 AM, Lance Norskog <go...@gmail.com>
>> wrote:
>> >
>> >> Wildcards are slow. Leading wildcards are even more slow. Is there
>> >> some way to search that data differently? If it is a string, can you
>> >> change it to a text field and make sure 'apartment' is a separate
>> >> word?
>> >>
>> >> On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn <
>> chamnapchhorn@gmail.com>
>> >> wrote:
>> >> > Hi Michael,
>> >> >
>> >> > Thanks for quick response. Based on documentation, "facet.mincount"
>> means
>> >> > that solr will return facet fields that has at least that number. For
>> >> me, I
>> >> > just want to ensure my facet fields count doesn't have zero value.
>> >> >
>> >> > I try to increase to 10, but it still slows even for the same query.
>> >> >
>> >> > Actually, those 13 million documents are divided into 200 portals. I
>> >> > already include "fq=portal_uuid: kjkjkjk" inside each nested query,
>> but
>> >> > it's still slow.
>> >> >
>> >> > On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
>> >> > michael.della.bitta@appinions.com> wrote:
>> >> >
>> >> >> Hi Chamnap,
>> >> >>
>> >> >> The first thing that jumped out at me was "facet.mincount=1". Are you
>> >> >> sure you need this? Increasing this number should drastically improve
>> >> >> speed.
>> >> >>
>> >> >> Michael Della Bitta
>> >> >>
>> >> >> ------------------------------------------------
>> >> >> Appinions, Inc. -- Where Influence Isn’t a Game.
>> >> >> http://www.appinions.com
>> >> >>
>> >> >>
>> >> >> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <
>> >> chamnapchhorn@gmail.com>
>> >> >> wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17
>> Gb.
>> >> >> The
>> >> >> > problem is that my query is so slow; the average response time is
>> 12
>> >> secs
>> >> >> > against 13 millions documents.
>> >> >> >
>> >> >> > What I am doing is to send quoted string (q2) to string fields and
>> >> >> > non-quoted string (q1) to other fields and combine the result
>> >> together.
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
>> >> >> >
>> >> >>
>> >>
>> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
>> >> >> > *
>> >> >> >
>> >> >>
>> >>
>> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
>> >> >> > *
>> >> >> >
>> >> >>
>> >>
>> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
>> >> >> >
>> >> >> > I have done solr optimize already, but it's still slow. Any idea
>> how
>> >> to
>> >> >> > improve the speed? Am I done anything wrong?
>> >> >> >
>> >> >> > --
>> >> >> > Chhorn Chamnap
>> >> >> > http://chamnap.github.com/
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Chhorn Chamnap
>> >> > http://chamnap.github.com/
>> >>
>> >>
>> >>
>> >> --
>> >> Lance Norskog
>> >> goksron@gmail.com
>> >>
>> >
>> >
>> >
>> > --
>> > Chhorn Chamnap
>> > http://chamnap.github.com/
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>
>
> --
> Chhorn Chamnap
> http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Chamnap Chhorn <ch...@gmail.com>.

Lance, I didn't use widcard at all. I use only this, the difference is
quoted or not.

q2=*"apartment"*
q1=*apartment*
*
*
On Tue, Jul 3, 2012 at 12:06 PM, Lance Norskog <go...@gmail.com> wrote:

> &q2=*"apartment"*
> q1=*apartment*
>
> These are wildcards
>
> On Mon, Jul 2, 2012 at 8:30 PM, Chamnap Chhorn <ch...@gmail.com>
> wrote:
> > Hi Lance,
> >
> > I didn't use wildcards at all. This is a normal text search only. I need
> a
> > string field because it needs to be matched exactly, and the value is
> > sometimes a multi-word, so quoted it is necessary.
> >
> > By the way, if I do a super plain query, it takes at least 600ms. I'm not
> > sure why. On another solr instance with similar amount of data, it takes
> > only 50ms.
> >
> > I see something strange on the response, there is always
> >
> > <str name="command">build</str>
> >
> > What does that mean?
> >
> > On Tue, Jul 3, 2012 at 10:02 AM, Lance Norskog <go...@gmail.com>
> wrote:
> >
> >> Wildcards are slow. Leading wildcards are even more slow. Is there
> >> some way to search that data differently? If it is a string, can you
> >> change it to a text field and make sure 'apartment' is a separate
> >> word?
> >>
> >> On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn <
> chamnapchhorn@gmail.com>
> >> wrote:
> >> > Hi Michael,
> >> >
> >> > Thanks for quick response. Based on documentation, "facet.mincount"
> means
> >> > that solr will return facet fields that has at least that number. For
> >> me, I
> >> > just want to ensure my facet fields count doesn't have zero value.
> >> >
> >> > I try to increase to 10, but it still slows even for the same query.
> >> >
> >> > Actually, those 13 million documents are divided into 200 portals. I
> >> > already include "fq=portal_uuid: kjkjkjk" inside each nested query,
> but
> >> > it's still slow.
> >> >
> >> > On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
> >> > michael.della.bitta@appinions.com> wrote:
> >> >
> >> >> Hi Chamnap,
> >> >>
> >> >> The first thing that jumped out at me was "facet.mincount=1". Are you
> >> >> sure you need this? Increasing this number should drastically improve
> >> >> speed.
> >> >>
> >> >> Michael Della Bitta
> >> >>
> >> >> ------------------------------------------------
> >> >> Appinions, Inc. -- Where Influence Isn’t a Game.
> >> >> http://www.appinions.com
> >> >>
> >> >>
> >> >> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <
> >> chamnapchhorn@gmail.com>
> >> >> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17
> Gb.
> >> >> The
> >> >> > problem is that my query is so slow; the average response time is
> 12
> >> secs
> >> >> > against 13 millions documents.
> >> >> >
> >> >> > What I am doing is to send quoted string (q2) to string fields and
> >> >> > non-quoted string (q1) to other fields and combine the result
> >> together.
> >> >> >
> >> >> >
> >> >>
> >>
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> >> >> >
> >> >>
> >>
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> >> >> > *
> >> >> >
> >> >>
> >>
> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
> >> >> > *
> >> >> >
> >> >>
> >>
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
> >> >> >
> >> >> > I have done solr optimize already, but it's still slow. Any idea
> how
> >> to
> >> >> > improve the speed? Am I done anything wrong?
> >> >> >
> >> >> > --
> >> >> > Chhorn Chamnap
> >> >> > http://chamnap.github.com/
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Chhorn Chamnap
> >> > http://chamnap.github.com/
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >>
> >
> >
> >
> > --
> > Chhorn Chamnap
> > http://chamnap.github.com/
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Chhorn Chamnap
http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Lance Norskog <go...@gmail.com>.

&q2=*"apartment"*
q1=*apartment*

These are wildcards

On Mon, Jul 2, 2012 at 8:30 PM, Chamnap Chhorn <ch...@gmail.com> wrote:
> Hi Lance,
>
> I didn't use wildcards at all. This is a normal text search only. I need a
> string field because it needs to be matched exactly, and the value is
> sometimes a multi-word, so quoted it is necessary.
>
> By the way, if I do a super plain query, it takes at least 600ms. I'm not
> sure why. On another solr instance with similar amount of data, it takes
> only 50ms.
>
> I see something strange on the response, there is always
>
> <str name="command">build</str>
>
> What does that mean?
>
> On Tue, Jul 3, 2012 at 10:02 AM, Lance Norskog <go...@gmail.com> wrote:
>
>> Wildcards are slow. Leading wildcards are even more slow. Is there
>> some way to search that data differently? If it is a string, can you
>> change it to a text field and make sure 'apartment' is a separate
>> word?
>>
>> On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn <ch...@gmail.com>
>> wrote:
>> > Hi Michael,
>> >
>> > Thanks for quick response. Based on documentation, "facet.mincount" means
>> > that solr will return facet fields that has at least that number. For
>> me, I
>> > just want to ensure my facet fields count doesn't have zero value.
>> >
>> > I try to increase to 10, but it still slows even for the same query.
>> >
>> > Actually, those 13 million documents are divided into 200 portals. I
>> > already include "fq=portal_uuid: kjkjkjk" inside each nested query, but
>> > it's still slow.
>> >
>> > On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
>> > michael.della.bitta@appinions.com> wrote:
>> >
>> >> Hi Chamnap,
>> >>
>> >> The first thing that jumped out at me was "facet.mincount=1". Are you
>> >> sure you need this? Increasing this number should drastically improve
>> >> speed.
>> >>
>> >> Michael Della Bitta
>> >>
>> >> ------------------------------------------------
>> >> Appinions, Inc. -- Where Influence Isn’t a Game.
>> >> http://www.appinions.com
>> >>
>> >>
>> >> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <
>> chamnapchhorn@gmail.com>
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb.
>> >> The
>> >> > problem is that my query is so slow; the average response time is 12
>> secs
>> >> > against 13 millions documents.
>> >> >
>> >> > What I am doing is to send quoted string (q2) to string fields and
>> >> > non-quoted string (q1) to other fields and combine the result
>> together.
>> >> >
>> >> >
>> >>
>> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
>> >> >
>> >>
>> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
>> >> > *
>> >> >
>> >>
>> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
>> >> > *
>> >> >
>> >>
>> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
>> >> >
>> >> > I have done solr optimize already, but it's still slow. Any idea how
>> to
>> >> > improve the speed? Am I done anything wrong?
>> >> >
>> >> > --
>> >> > Chhorn Chamnap
>> >> > http://chamnap.github.com/
>> >>
>> >
>> >
>> >
>> > --
>> > Chhorn Chamnap
>> > http://chamnap.github.com/
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>
>
> --
> Chhorn Chamnap
> http://chamnap.github.com/



-- 
Lance Norskog
goksron@gmail.com

Re: How to improve this solr query?

Posted by Chamnap Chhorn <ch...@gmail.com>.

Hi Lance,

I didn't use wildcards at all. This is a normal text search only. I need a
string field because it needs to be matched exactly, and the value is
sometimes a multi-word, so quoted it is necessary.

By the way, if I do a super plain query, it takes at least 600ms. I'm not
sure why. On another solr instance with similar amount of data, it takes
only 50ms.

I see something strange on the response, there is always

<str name="command">build</str>

What does that mean?

On Tue, Jul 3, 2012 at 10:02 AM, Lance Norskog <go...@gmail.com> wrote:

> Wildcards are slow. Leading wildcards are even more slow. Is there
> some way to search that data differently? If it is a string, can you
> change it to a text field and make sure 'apartment' is a separate
> word?
>
> On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn <ch...@gmail.com>
> wrote:
> > Hi Michael,
> >
> > Thanks for quick response. Based on documentation, "facet.mincount" means
> > that solr will return facet fields that has at least that number. For
> me, I
> > just want to ensure my facet fields count doesn't have zero value.
> >
> > I try to increase to 10, but it still slows even for the same query.
> >
> > Actually, those 13 million documents are divided into 200 portals. I
> > already include "fq=portal_uuid: kjkjkjk" inside each nested query, but
> > it's still slow.
> >
> > On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
> > michael.della.bitta@appinions.com> wrote:
> >
> >> Hi Chamnap,
> >>
> >> The first thing that jumped out at me was "facet.mincount=1". Are you
> >> sure you need this? Increasing this number should drastically improve
> >> speed.
> >>
> >> Michael Della Bitta
> >>
> >> ------------------------------------------------
> >> Appinions, Inc. -- Where Influence Isn’t a Game.
> >> http://www.appinions.com
> >>
> >>
> >> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <
> chamnapchhorn@gmail.com>
> >> wrote:
> >> > Hi all,
> >> >
> >> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb.
> >> The
> >> > problem is that my query is so slow; the average response time is 12
> secs
> >> > against 13 millions documents.
> >> >
> >> > What I am doing is to send quoted string (q2) to string fields and
> >> > non-quoted string (q1) to other fields and combine the result
> together.
> >> >
> >> >
> >>
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> >> >
> >>
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> >> > *
> >> >
> >>
> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
> >> > *
> >> >
> >>
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
> >> >
> >> > I have done solr optimize already, but it's still slow. Any idea how
> to
> >> > improve the speed? Am I done anything wrong?
> >> >
> >> > --
> >> > Chhorn Chamnap
> >> > http://chamnap.github.com/
> >>
> >
> >
> >
> > --
> > Chhorn Chamnap
> > http://chamnap.github.com/
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Chhorn Chamnap
http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Lance Norskog <go...@gmail.com>.

Wildcards are slow. Leading wildcards are even more slow. Is there
some way to search that data differently? If it is a string, can you
change it to a text field and make sure 'apartment' is a separate
word?

On Mon, Jul 2, 2012 at 10:01 AM, Chamnap Chhorn <ch...@gmail.com> wrote:
> Hi Michael,
>
> Thanks for quick response. Based on documentation, "facet.mincount" means
> that solr will return facet fields that has at least that number. For me, I
> just want to ensure my facet fields count doesn't have zero value.
>
> I try to increase to 10, but it still slows even for the same query.
>
> Actually, those 13 million documents are divided into 200 portals. I
> already include "fq=portal_uuid: kjkjkjk" inside each nested query, but
> it's still slow.
>
> On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
> michael.della.bitta@appinions.com> wrote:
>
>> Hi Chamnap,
>>
>> The first thing that jumped out at me was "facet.mincount=1". Are you
>> sure you need this? Increasing this number should drastically improve
>> speed.
>>
>> Michael Della Bitta
>>
>> ------------------------------------------------
>> Appinions, Inc. -- Where Influence Isn’t a Game.
>> http://www.appinions.com
>>
>>
>> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <ch...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb.
>> The
>> > problem is that my query is so slow; the average response time is 12 secs
>> > against 13 millions documents.
>> >
>> > What I am doing is to send quoted string (q2) to string fields and
>> > non-quoted string (q1) to other fields and combine the result together.
>> >
>> >
>> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
>> >
>> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
>> > *
>> >
>> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
>> > *
>> >
>> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
>> >
>> > I have done solr optimize already, but it's still slow. Any idea how to
>> > improve the speed? Am I done anything wrong?
>> >
>> > --
>> > Chhorn Chamnap
>> > http://chamnap.github.com/
>>
>
>
>
> --
> Chhorn Chamnap
> http://chamnap.github.com/



-- 
Lance Norskog
goksron@gmail.com

Re: How to improve this solr query?

Posted by Chamnap Chhorn <ch...@gmail.com>.

Hi Michael,

Thanks for quick response. Based on documentation, "facet.mincount" means
that solr will return facet fields that has at least that number. For me, I
just want to ensure my facet fields count doesn't have zero value.

I try to increase to 10, but it still slows even for the same query.

Actually, those 13 million documents are divided into 200 portals. I
already include "fq=portal_uuid: kjkjkjk" inside each nested query, but
it's still slow.

On Mon, Jul 2, 2012 at 11:47 PM, Michael Della Bitta <
michael.della.bitta@appinions.com> wrote:

> Hi Chamnap,
>
> The first thing that jumped out at me was "facet.mincount=1". Are you
> sure you need this? Increasing this number should drastically improve
> speed.
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions, Inc. -- Where Influence Isn’t a Game.
> http://www.appinions.com
>
>
> On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <ch...@gmail.com>
> wrote:
> > Hi all,
> >
> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb.
> The
> > problem is that my query is so slow; the average response time is 12 secs
> > against 13 millions documents.
> >
> > What I am doing is to send quoted string (q2) to string fields and
> > non-quoted string (q1) to other fields and combine the result together.
> >
> >
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> >
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> > *
> >
> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
> > *
> >
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
> >
> > I have done solr optimize already, but it's still slow. Any idea how to
> > improve the speed? Am I done anything wrong?
> >
> > --
> > Chhorn Chamnap
> > http://chamnap.github.com/
>



-- 
Chhorn Chamnap
http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Michael Della Bitta <mi...@appinions.com>.

Hi Chamnap,

The first thing that jumped out at me was "facet.mincount=1". Are you
sure you need this? Increasing this number should drastically improve
speed.

Michael Della Bitta

------------------------------------------------
Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Mon, Jul 2, 2012 at 12:35 PM, Chamnap Chhorn <ch...@gmail.com> wrote:
> Hi all,
>
> I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The
> problem is that my query is so slow; the average response time is 12 secs
> against 13 millions documents.
>
> What I am doing is to send quoted string (q2) to string fields and
> non-quoted string (q1) to other fields and combine the result together.
>
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> *
> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
> *
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
>
> I have done solr optimize already, but it's still slow. Any idea how to
> improve the speed? Am I done anything wrong?
>
> --
> Chhorn Chamnap
> http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Chamnap Chhorn <ch...@gmail.com>.

Hi Amit,

Thanks for your response.
1. It's just sometimes I see solr doesn't sort by score desc, so I made it
like that. I will have to check that again.
2. q1 and q2 are doing the search but just on different fields. String
fields means that it must match exactly, and solr need the q parameter to
be quoted. I did a nested query with the OR operator.

I'll check out the bf, pf, bq parameter more.

Thanks for the advise. :)

On Wed, Jul 4, 2012 at 2:28 PM, Amit Nithian <an...@gmail.com> wrote:

> Couple questions:
> 1) Why are you explicitly telling solr to sort by score desc,
> shouldn't it do that for you? Could this be a source of performance
> problems since sorting requires the loading of the field caches?
> 2) Of the query parameters, q1 and q2, which one is actually doing
> "text" searching on your index? It looks like q1 is doing non-string
> related stuff, could this be better handled in either the bf or bq
> section of the edismax config? Looking at the sample though I don't
> understand how q1=apartment would hit non-string fields again (but see
> #3)
> 3) Are the "string" fields literally of string type (i.e. no analysis
> on the field) or are you saying string loosely to mean "text" field.
> pf ==> phrase fields ==> given a multiple word query, will ensure that
> the specified phrase exists in the specified fields separated by some
> slop ("hello my world" may match "hello world" depending on this slop
> value). The "qf" means that given a multi term query, each term exists
> in the specified fields (name, description whatever text fields you
> want).
>
> Best
> Amit
>
> On Mon, Jul 2, 2012 at 9:35 AM, Chamnap Chhorn <ch...@gmail.com>
> wrote:
> > Hi all,
> >
> > I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb.
> The
> > problem is that my query is so slow; the average response time is 12 secs
> > against 13 millions documents.
> >
> > What I am doing is to send quoted string (q2) to string fields and
> > non-quoted string (q1) to other fields and combine the result together.
> >
> >
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> >
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> > *
> >
> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
> > *
> >
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
> >
> > I have done solr optimize already, but it's still slow. Any idea how to
> > improve the speed? Am I done anything wrong?
> >
> > --
> > Chhorn Chamnap
> > http://chamnap.github.com/
>



-- 
Chhorn Chamnap
http://chamnap.github.com/

Re: How to improve this solr query?

Posted by Amit Nithian <an...@gmail.com>.

Couple questions:
1) Why are you explicitly telling solr to sort by score desc,
shouldn't it do that for you? Could this be a source of performance
problems since sorting requires the loading of the field caches?
2) Of the query parameters, q1 and q2, which one is actually doing
"text" searching on your index? It looks like q1 is doing non-string
related stuff, could this be better handled in either the bf or bq
section of the edismax config? Looking at the sample though I don't
understand how q1=apartment would hit non-string fields again (but see
#3)
3) Are the "string" fields literally of string type (i.e. no analysis
on the field) or are you saying string loosely to mean "text" field.
pf ==> phrase fields ==> given a multiple word query, will ensure that
the specified phrase exists in the specified fields separated by some
slop ("hello my world" may match "hello world" depending on this slop
value). The "qf" means that given a multi term query, each term exists
in the specified fields (name, description whatever text fields you
want).

Best
Amit

On Mon, Jul 2, 2012 at 9:35 AM, Chamnap Chhorn <ch...@gmail.com> wrote:
> Hi all,
>
> I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The
> problem is that my query is so slow; the average response time is 12 secs
> against 13 millions documents.
>
> What I am doing is to send quoted string (q2) to string fields and
> non-quoted string (q1) to other fields and combine the result together.
>
> facet=true&sort=score+desc&q2=*"apartment"*&facet.mincount=1&q1=*apartment*
> &tie=0.1&q.alt=*:*&wt=json&version=2.2&rows=20&fl=uuid&facet.query=has_map:+true&facet.query=has_image:+true&facet.query=has_website:+true&start=0&q=
> *
> _query_:+"{!dismax+qf='.....'+fq='......'+v=$q1}"+OR+_query_:+"{!dismax+qf='......'+fq='.......'+v=$q2}"
> *
> &facet.field={!ex%3Ddt}sub_category_uuids&facet.field={!ex%3Ddt}location_uuid
>
> I have done solr optimize already, but it's still slow. Any idea how to
> improve the speed? Am I done anything wrong?
>
> --
> Chhorn Chamnap
> http://chamnap.github.com/