You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Blythe <jo...@gmail.com> on 2018/09/09 20:40:04 UTC

504 timeout

hi all. we just migrated to cloud on friday night (woohoo!). everything is
looking good (great!) overall. we did, however, just run into a hiccup.
running a query like this got us a 504 gateway time-out error:

**some* *foo* *bar* *query**

it was about 6 partials with encapsulating wildcards that someone was
running that gave the error. doing 4 or 5 of them worked fine, but upon
adding the last one or two it went caput. all operations have been zippier
since the migration before doing some of those wildcard queries which took
time (if they worked at all). is this something related directly w our
server configuration or is there some solr/cloud config'ing that we could
work on that would allow better response to these sorts of queries (though
it'd be at a cost, i'd imagine!).

thanks for any insight!

best,

--
John Blythe

Re: 504 timeout

Posted by John Blythe <jo...@gmail.com>.
ah, great thought. didn't even think of that. we already have a couple
ngram-based fields. will send over to the stakeholder who was attempting
this.

thanks!

--
John Blythe


On Sun, Sep 9, 2018 at 11:31 PM Erick Erickson <er...@gmail.com>
wrote:

> First of all, wildcards are evil. Be sure that the reason people are
> using wildcards wouldn't be better served by proper tokenizing,
> perhaps something like stemming etc.
>
> Assuming that wildcards must be handled though, there are two main
> strategies:
> 1> if you want to use leading wildcards, look at
> ReverseWildcardFilterFactory. For something like abc* (trailing
> wildcard), conceptually Lucene has to construct a big OR query of
> every term that starts with "abc". That's not hard and is also pretty
> fast, just jump to the first term that starts with "abc" and gather
> all of them (they're sorted lexicaly) until you get to the first term
> starting with "abd".
>
> _Leading_ wildcards are a whole 'nother story. *abc means that each
> and every distinct term in the field must be enumerated. The first
> term could be aaaaaaaaabc and the last term in the field zzzzzzzabc.
> There's no way to tell without checking every one.
> ReverseWildcardFilterFactory handles indexing the term, well, reversed
> so the above example not only would the term aaaaaaaaabc bb indexed,
> but also cbaaaaaaaaa. Now both leading and trailing wildcards are
> automagically made into trailing wildcards.
>
> 2> If you must allow leading and trailing wildcards on the same term
> *abc*, consider ngramming, bigrams are usually sufficient. So aaabcde
> is indexed as aa, aa, ab, bd, de and searching for *abc* becomes
> searching for "ab bc".
>
> Both of these make the index larger, but usually by surprisingly
> little. People will also index these variants in separate fields upon
> occasion, it depends on the use-cases needed to support. Ngramming for
> instance would find "ab" in the above (no wildcards)....
>
> Best,
> Erick
> On Sun, Sep 9, 2018 at 1:40 PM John Blythe <jo...@gmail.com> wrote:
> >
> > hi all. we just migrated to cloud on friday night (woohoo!). everything
> is
> > looking good (great!) overall. we did, however, just run into a hiccup.
> > running a query like this got us a 504 gateway time-out error:
> >
> > **some* *foo* *bar* *query**
> >
> > it was about 6 partials with encapsulating wildcards that someone was
> > running that gave the error. doing 4 or 5 of them worked fine, but upon
> > adding the last one or two it went caput. all operations have been
> zippier
> > since the migration before doing some of those wildcard queries which
> took
> > time (if they worked at all). is this something related directly w our
> > server configuration or is there some solr/cloud config'ing that we could
> > work on that would allow better response to these sorts of queries
> (though
> > it'd be at a cost, i'd imagine!).
> >
> > thanks for any insight!
> >
> > best,
> >
> > --
> > John Blythe
>

Re: 504 timeout

Posted by Erick Erickson <er...@gmail.com>.
First of all, wildcards are evil. Be sure that the reason people are
using wildcards wouldn't be better served by proper tokenizing,
perhaps something like stemming etc.

Assuming that wildcards must be handled though, there are two main strategies:
1> if you want to use leading wildcards, look at
ReverseWildcardFilterFactory. For something like abc* (trailing
wildcard), conceptually Lucene has to construct a big OR query of
every term that starts with "abc". That's not hard and is also pretty
fast, just jump to the first term that starts with "abc" and gather
all of them (they're sorted lexicaly) until you get to the first term
starting with "abd".

_Leading_ wildcards are a whole 'nother story. *abc means that each
and every distinct term in the field must be enumerated. The first
term could be aaaaaaaaabc and the last term in the field zzzzzzzabc.
There's no way to tell without checking every one.
ReverseWildcardFilterFactory handles indexing the term, well, reversed
so the above example not only would the term aaaaaaaaabc bb indexed,
but also cbaaaaaaaaa. Now both leading and trailing wildcards are
automagically made into trailing wildcards.

2> If you must allow leading and trailing wildcards on the same term
*abc*, consider ngramming, bigrams are usually sufficient. So aaabcde
is indexed as aa, aa, ab, bd, de and searching for *abc* becomes
searching for "ab bc".

Both of these make the index larger, but usually by surprisingly
little. People will also index these variants in separate fields upon
occasion, it depends on the use-cases needed to support. Ngramming for
instance would find "ab" in the above (no wildcards)....

Best,
Erick
On Sun, Sep 9, 2018 at 1:40 PM John Blythe <jo...@gmail.com> wrote:
>
> hi all. we just migrated to cloud on friday night (woohoo!). everything is
> looking good (great!) overall. we did, however, just run into a hiccup.
> running a query like this got us a 504 gateway time-out error:
>
> **some* *foo* *bar* *query**
>
> it was about 6 partials with encapsulating wildcards that someone was
> running that gave the error. doing 4 or 5 of them worked fine, but upon
> adding the last one or two it went caput. all operations have been zippier
> since the migration before doing some of those wildcard queries which took
> time (if they worked at all). is this something related directly w our
> server configuration or is there some solr/cloud config'ing that we could
> work on that would allow better response to these sorts of queries (though
> it'd be at a cost, i'd imagine!).
>
> thanks for any insight!
>
> best,
>
> --
> John Blythe