You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Blythe <jo...@curvolabs.com> on 2017/04/11 15:35:12 UTC

simple matches not catching at query time

hi everyone.

i recently wrote in ('analysis matching, query not') but never heard back
so wanted to follow up. i'm at my wit's end currently. i have several
fields that are showing matches in the analysis tab. when i dumb down the
string sent over to query it still gives me issues in some field cases.

any thoughts on how to debug to figure out wtf is going on here would be
greatly appreciated. the use case is straightforward and the solution
should be as well, so i'm at a loss as to how in the world i'm having
issues w this.

can provide any amount of contextualizing information you need, just let me
know what could be beneficial.

best,

john

Re: simple matches not catching at query time

Posted by Mikhail Khludnev <mk...@apache.org>.
John,

Double quotes is a sign of a phrase query (and round braces inside of
double quotes is a horrible to think about beast). Since the query is a
disjunction of phrases and the shingle it has no chance to match any of
indexed values from screenshots. Probably you need to flip
autoGeneratePhraseQueries
(see
https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties
)

On Wed, Apr 12, 2017 at 2:52 PM, John Blythe <jo...@curvolabs.com> wrote:

> you can view some of my analyses here that has caused me grief and
> confusion: http://imgur.com/a/Fcht3
>
> here is a debug output:
>
> "rawquerystring":"\"ZIMMER:ZIMMER US\"",
>     "querystring":"\"ZIMMER:ZIMMER US\"",
>     "parsedquery":"(+DisjunctionMaxQuery((manufacturer_syn:\"zimmer
> zimmer\" | manufacturer_s:ZIMMER:ZIMMER US |
> manufacturer_split_syn:\"zimmer zimmer\" |
> manufacturer_syn_both:\"(zimmer_zimmer_us zimmer) zimmer\" |
> manufacturer_text:\"zimmer zimmer us\")) ())/no_coord",
>     "parsedquery_toString":"+(manufacturer_syn:\"zimmer zimmer\" |
> manufacturer_s:ZIMMER:ZIMMER US | manufacturer_split_syn:\"zimmer
> zimmer\" | manufacturer_syn_both:\"(zimmer_zimmer_us zimmer) zimmer\"
> | manufacturer_text:\"zimmer zimmer us\") ()",
>     "explain":{},
>
>
> is it the quotes that are getting things screwy? i'm not entirely versed on
> how to interpret the raw and parsed query data here. does \"zimmer zimmer\"
> mean that lucene is receiving that shingle rather than 'zimmer' (implicit
> OR) 'zimmer'? if so, then i'm not understanding why that's happening bc
> some of these have WDF that is generating word parts.
>
> aside: i've changed the server-side code used to send the query to split on
> the colon and send over as separate tokens wrapped in quotes. in the case
> above, field:("VENDOR:VENDOR US") becomes field:("VENDOR" "VENDOR US")
> which successfully solves my immediate problem. that said, i'd really like
> to understand better where things are going wrong w the above _and_ learn
> better how to debug my queries.
>
> i need to get the TermsComponent used to find what is being indexed so i
> can report back on that and then can share the list of items requested by
> alessandro.
>
> thanks all!
>
>
> On Wed, Apr 12, 2017 at 5:26 AM, alessandro.benedetti <
> a.benedetti@sease.io>
> wrote:
>
> > hi John, I am a bit confused here.
> >
> > Let's focus on one field and one document.
> >
> > Given this parsed phrase query :
> >
> > manufacturer_split_syn:"vendor vendor"
> >
> > and the document 1 :
> > D1
> > {"id":"1"
> > "manufacturer_split_syn" : "vendor"}
> >
> > Are you expecting this to match ?
> > because it shouldn't ...
> >
> > let's try to formulate the problem in this way, with less explaining and
> > more step by step :
> >
> > Original Query :
> > Parsed Query:
> > Document indexed :
> > Terms in the index :
> >
> > Cheers
> >
> >
> >
> > -----
> > ---------------
> > Alessandro Benedetti
> > Search Consultant, R&D Software Engineer, Director
> > Sease Ltd. - www.sease.io
> > --
> > View this message in context: http://lucene.472066.n3.nabble
> > .com/simple-matches-not-catching-at-query-time-tp4329337p4329475.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Re: simple matches not catching at query time

Posted by John Blythe <jo...@curvolabs.com>.
you can view some of my analyses here that has caused me grief and
confusion: http://imgur.com/a/Fcht3

here is a debug output:

"rawquerystring":"\"ZIMMER:ZIMMER US\"",
    "querystring":"\"ZIMMER:ZIMMER US\"",
    "parsedquery":"(+DisjunctionMaxQuery((manufacturer_syn:\"zimmer
zimmer\" | manufacturer_s:ZIMMER:ZIMMER US |
manufacturer_split_syn:\"zimmer zimmer\" |
manufacturer_syn_both:\"(zimmer_zimmer_us zimmer) zimmer\" |
manufacturer_text:\"zimmer zimmer us\")) ())/no_coord",
    "parsedquery_toString":"+(manufacturer_syn:\"zimmer zimmer\" |
manufacturer_s:ZIMMER:ZIMMER US | manufacturer_split_syn:\"zimmer
zimmer\" | manufacturer_syn_both:\"(zimmer_zimmer_us zimmer) zimmer\"
| manufacturer_text:\"zimmer zimmer us\") ()",
    "explain":{},


is it the quotes that are getting things screwy? i'm not entirely versed on
how to interpret the raw and parsed query data here. does \"zimmer zimmer\"
mean that lucene is receiving that shingle rather than 'zimmer' (implicit
OR) 'zimmer'? if so, then i'm not understanding why that's happening bc
some of these have WDF that is generating word parts.

aside: i've changed the server-side code used to send the query to split on
the colon and send over as separate tokens wrapped in quotes. in the case
above, field:("VENDOR:VENDOR US") becomes field:("VENDOR" "VENDOR US")
which successfully solves my immediate problem. that said, i'd really like
to understand better where things are going wrong w the above _and_ learn
better how to debug my queries.

i need to get the TermsComponent used to find what is being indexed so i
can report back on that and then can share the list of items requested by
alessandro.

thanks all!


On Wed, Apr 12, 2017 at 5:26 AM, alessandro.benedetti <a....@sease.io>
wrote:

> hi John, I am a bit confused here.
>
> Let's focus on one field and one document.
>
> Given this parsed phrase query :
>
> manufacturer_split_syn:"vendor vendor"
>
> and the document 1 :
> D1
> {"id":"1"
> "manufacturer_split_syn" : "vendor"}
>
> Are you expecting this to match ?
> because it shouldn't ...
>
> let's try to formulate the problem in this way, with less explaining and
> more step by step :
>
> Original Query :
> Parsed Query:
> Document indexed :
> Terms in the index :
>
> Cheers
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.nabble
> .com/simple-matches-not-catching-at-query-time-tp4329337p4329475.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: simple matches not catching at query time

Posted by "alessandro.benedetti" <a....@sease.io>.
hi John, I am a bit confused here.

Let's focus on one field and one document.

Given this parsed phrase query :

manufacturer_split_syn:"vendor vendor"

and the document 1 :
D1
{"id":"1"
"manufacturer_split_syn" : "vendor"}

Are you expecting this to match ?
because it shouldn't ...

let's try to formulate the problem in this way, with less explaining and
more step by step :

Original Query :
Parsed Query:
Document indexed :
Terms in the index : 

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/simple-matches-not-catching-at-query-time-tp4329337p4329475.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: simple matches not catching at query time

Posted by Mikhail Khludnev <mk...@apache.org>.
John,

Here I mean a query, which matches a doc, which it expected to be matched
by the problem query.
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-TheexplainOtherParameter

On Tue, Apr 11, 2017 at 11:32 PM, John Blythe <jo...@curvolabs.com> wrote:

> first off, i don't think i have a full handle on the import of what is
> outputted by the debugger.
>
> that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> is
> matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should
> match. the query analyzer is keywordtokenizer, pattern replacement
> (replaces all non-alphanumeric with underscores), checks for synonyms (the
> underscores are my way around the multi term synonym problem), then
> worddelimiter is used to blow out the underscores and generate word parts
> ("vendor_vendor" => 'vendor' 'vendor'), stop filter, lower case, stem.
>
> in your mentioned strategy, what is the "id:<expected>" representative of?
>
> thanks!
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | john@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Tue, Apr 11, 2017 at 4:12 PM, Mikhail Khludnev <mk...@apache.org> wrote:
>
> > John,
> >
> > How do you suppose to match any of "parsed_filter_queries":["
> > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> > against
> > vendor_coolmed | coolmed | vendor ?
> >
> > I just can't see any chance to match them.
> >
> > One possible strategy is pick the simplest filter query, put it as a main
> > query.
> > Then pass &expainOther=id:<expected> and share the explanation.
> >
> >
> >
> > On Tue, Apr 11, 2017 at 8:57 PM, John Blythe <jo...@curvolabs.com> wrote:
> >
> > > hi, erick.
> > >
> > > appreciate the feedback.
> > >
> > > 1> i'm sending the terms to solr enquoted
> > > 2> i'd thought that at one point and reran the indexing. i _had_ had
> two
> > of
> > > the fields not indexed, but this represented one pass (same analyzer)
> > from
> > > two diff source fields while 2 or 3 of the other 4 fields _were_
> seeming
> > as
> > > if they should match. maybe just need to do this for said sanity at
> this
> > > point lol
> > > 3> i'm using dismax, no mm param set
> > >
> > > some further context:
> > >
> > > i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR
> > US")
> > > OR manufacturer_syn:("VENDOR:VENDOR US")...
> > >
> > > The indexed value is: "Vendor"
> > >
> > > the output of field 1 in the Analysis tab would be:
> > > *index*: vendor_coolmed | coolmed | vendor
> > > *query*: vendor_vendor_coolmed | vendor | vendor
> > >
> > > the other field (and a couple other, related ones, actually) have
> similar
> > > situations where I see a clear match (as well as get the confirmation
> of
> > it
> > > when switching to the old UI and seeing the highlighting) yet get no
> > > results in my actual query.
> > >
> > > a further note. when i get the query debugging enabled I can see this
> in
> > > the output:
> > > "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> > > "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> > > "parsed_filter_queries":["
> > > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor
> vendor\")"],...
> > >
> > > It looks as if the parsed query is wrapped in quotes even after having
> > been
> > > parsed, so while the correct tokens, i.e. "vendor", are present to
> match
> > > against the indexed value, the fact that the entire parsed derivative
> of
> > > the initial query is sent to match (if that's indeed what's happening)
> > > won't actually get any hits. Yet if I remove the quotes when sending
> over
> > > to query then the parsing doesn't get to a point of having any
> > > worthwhile/matching tokens to begin with.
> > >
> > > one last thing: i've attempted with just "vendor" being sent over to
> help
> > > remove complexity and, once more, i see Analysis chain functioning just
> > > fine but the query itself getting 0 hits.
> > >
> > > think TermComponents is the best option at this point or something else
> > > given the above filler info?
> > >
> > > --
> > > *John Blythe*
> > > Product Manager & Lead Developer
> > >
> > > 251.605.3071 | john@curvolabs.com
> > > www.curvolabs.com
> > >
> > > 58 Adams Ave
> > > Evansville, IN 47713
> > >
> > > On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > &debug=query is your friend. There are several issues that often trip
> > > > people up:
> > > >
> > > > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > > > all the way to the field in question. Trivial example:
> > > > I put (without quotes) "erick erickson" in the "name" field in the
> > > > analysis page and see that it gets tokenized correctly. But the query
> > > > "name:erick erickson" actually gets parsed at a higher level into
> > > > name:erick default_search_field:erickson. See the discussion at:
> > > > SOLR-9185
> > > >
> > > > 2> what you think is in your indexed field isn't really. Can happen
> if
> > > > you changed your analysis chain but didn't totally re-index. Can
> > > > happen because one of the parts of the analysis chain works
> > > > differently than you expect (WordDelimiterFilterFactory, for
> instance,
> > > > has a ton of options that can alter the tokens emitted). The
> > > > TermsComponent will let you examine the terms actually _in_ the index
> > > > that you search on. You stated that the analysis page shows you what
> > > > you expect, so this is a sanity check.
> > > >
> > > > 3> You're using edismax and setting some parameter, mm=100% is a
> > > > favorite and it's having this effect.
> > > >
> > > > So add debug=query and provide a sample document (or just a field)
> and
> > > > the schema definition for the field in question if you're still
> > > > stumped.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Tue, Apr 11, 2017 at 8:35 AM, John Blythe <jo...@curvolabs.com>
> > wrote:
> > > > > hi everyone.
> > > > >
> > > > > i recently wrote in ('analysis matching, query not') but never
> heard
> > > back
> > > > > so wanted to follow up. i'm at my wit's end currently. i have
> several
> > > > > fields that are showing matches in the analysis tab. when i dumb
> down
> > > the
> > > > > string sent over to query it still gives me issues in some field
> > cases.
> > > > >
> > > > > any thoughts on how to debug to figure out wtf is going on here
> would
> > > be
> > > > > greatly appreciated. the use case is straightforward and the
> solution
> > > > > should be as well, so i'm at a loss as to how in the world i'm
> having
> > > > > issues w this.
> > > > >
> > > > > can provide any amount of contextualizing information you need,
> just
> > > let
> > > > me
> > > > > know what could be beneficial.
> > > > >
> > > > > best,
> > > > >
> > > > > john
> > > >
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Re: simple matches not catching at query time

Posted by John Blythe <jo...@curvolabs.com>.
first off, i don't think i have a full handle on the import of what is
outputted by the debugger.

that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")" is
matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should
match. the query analyzer is keywordtokenizer, pattern replacement
(replaces all non-alphanumeric with underscores), checks for synonyms (the
underscores are my way around the multi term synonym problem), then
worddelimiter is used to blow out the underscores and generate word parts
("vendor_vendor" => 'vendor' 'vendor'), stop filter, lower case, stem.

in your mentioned strategy, what is the "id:<expected>" representative of?

thanks!

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | john@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Apr 11, 2017 at 4:12 PM, Mikhail Khludnev <mk...@apache.org> wrote:

> John,
>
> How do you suppose to match any of "parsed_filter_queries":["
> MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> against
> vendor_coolmed | coolmed | vendor ?
>
> I just can't see any chance to match them.
>
> One possible strategy is pick the simplest filter query, put it as a main
> query.
> Then pass &expainOther=id:<expected> and share the explanation.
>
>
>
> On Tue, Apr 11, 2017 at 8:57 PM, John Blythe <jo...@curvolabs.com> wrote:
>
> > hi, erick.
> >
> > appreciate the feedback.
> >
> > 1> i'm sending the terms to solr enquoted
> > 2> i'd thought that at one point and reran the indexing. i _had_ had two
> of
> > the fields not indexed, but this represented one pass (same analyzer)
> from
> > two diff source fields while 2 or 3 of the other 4 fields _were_ seeming
> as
> > if they should match. maybe just need to do this for said sanity at this
> > point lol
> > 3> i'm using dismax, no mm param set
> >
> > some further context:
> >
> > i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR
> US")
> > OR manufacturer_syn:("VENDOR:VENDOR US")...
> >
> > The indexed value is: "Vendor"
> >
> > the output of field 1 in the Analysis tab would be:
> > *index*: vendor_coolmed | coolmed | vendor
> > *query*: vendor_vendor_coolmed | vendor | vendor
> >
> > the other field (and a couple other, related ones, actually) have similar
> > situations where I see a clear match (as well as get the confirmation of
> it
> > when switching to the old UI and seeing the highlighting) yet get no
> > results in my actual query.
> >
> > a further note. when i get the query debugging enabled I can see this in
> > the output:
> > "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> > "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> > "parsed_filter_queries":["
> > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"],...
> >
> > It looks as if the parsed query is wrapped in quotes even after having
> been
> > parsed, so while the correct tokens, i.e. "vendor", are present to match
> > against the indexed value, the fact that the entire parsed derivative of
> > the initial query is sent to match (if that's indeed what's happening)
> > won't actually get any hits. Yet if I remove the quotes when sending over
> > to query then the parsing doesn't get to a point of having any
> > worthwhile/matching tokens to begin with.
> >
> > one last thing: i've attempted with just "vendor" being sent over to help
> > remove complexity and, once more, i see Analysis chain functioning just
> > fine but the query itself getting 0 hits.
> >
> > think TermComponents is the best option at this point or something else
> > given the above filler info?
> >
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | john@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
> > On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> > > &debug=query is your friend. There are several issues that often trip
> > > people up:
> > >
> > > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > > all the way to the field in question. Trivial example:
> > > I put (without quotes) "erick erickson" in the "name" field in the
> > > analysis page and see that it gets tokenized correctly. But the query
> > > "name:erick erickson" actually gets parsed at a higher level into
> > > name:erick default_search_field:erickson. See the discussion at:
> > > SOLR-9185
> > >
> > > 2> what you think is in your indexed field isn't really. Can happen if
> > > you changed your analysis chain but didn't totally re-index. Can
> > > happen because one of the parts of the analysis chain works
> > > differently than you expect (WordDelimiterFilterFactory, for instance,
> > > has a ton of options that can alter the tokens emitted). The
> > > TermsComponent will let you examine the terms actually _in_ the index
> > > that you search on. You stated that the analysis page shows you what
> > > you expect, so this is a sanity check.
> > >
> > > 3> You're using edismax and setting some parameter, mm=100% is a
> > > favorite and it's having this effect.
> > >
> > > So add debug=query and provide a sample document (or just a field) and
> > > the schema definition for the field in question if you're still
> > > stumped.
> > >
> > > Best,
> > > Erick
> > >
> > > On Tue, Apr 11, 2017 at 8:35 AM, John Blythe <jo...@curvolabs.com>
> wrote:
> > > > hi everyone.
> > > >
> > > > i recently wrote in ('analysis matching, query not') but never heard
> > back
> > > > so wanted to follow up. i'm at my wit's end currently. i have several
> > > > fields that are showing matches in the analysis tab. when i dumb down
> > the
> > > > string sent over to query it still gives me issues in some field
> cases.
> > > >
> > > > any thoughts on how to debug to figure out wtf is going on here would
> > be
> > > > greatly appreciated. the use case is straightforward and the solution
> > > > should be as well, so i'm at a loss as to how in the world i'm having
> > > > issues w this.
> > > >
> > > > can provide any amount of contextualizing information you need, just
> > let
> > > me
> > > > know what could be beneficial.
> > > >
> > > > best,
> > > >
> > > > john
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: simple matches not catching at query time

Posted by Mikhail Khludnev <mk...@apache.org>.
John,

How do you suppose to match any of "parsed_filter_queries":["
MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
against
vendor_coolmed | coolmed | vendor ?

I just can't see any chance to match them.

One possible strategy is pick the simplest filter query, put it as a main
query.
Then pass &expainOther=id:<expected> and share the explanation.



On Tue, Apr 11, 2017 at 8:57 PM, John Blythe <jo...@curvolabs.com> wrote:

> hi, erick.
>
> appreciate the feedback.
>
> 1> i'm sending the terms to solr enquoted
> 2> i'd thought that at one point and reran the indexing. i _had_ had two of
> the fields not indexed, but this represented one pass (same analyzer) from
> two diff source fields while 2 or 3 of the other 4 fields _were_ seeming as
> if they should match. maybe just need to do this for said sanity at this
> point lol
> 3> i'm using dismax, no mm param set
>
> some further context:
>
> i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR US")
> OR manufacturer_syn:("VENDOR:VENDOR US")...
>
> The indexed value is: "Vendor"
>
> the output of field 1 in the Analysis tab would be:
> *index*: vendor_coolmed | coolmed | vendor
> *query*: vendor_vendor_coolmed | vendor | vendor
>
> the other field (and a couple other, related ones, actually) have similar
> situations where I see a clear match (as well as get the confirmation of it
> when switching to the old UI and seeing the highlighting) yet get no
> results in my actual query.
>
> a further note. when i get the query debugging enabled I can see this in
> the output:
> "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> "parsed_filter_queries":["
> MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"],...
>
> It looks as if the parsed query is wrapped in quotes even after having been
> parsed, so while the correct tokens, i.e. "vendor", are present to match
> against the indexed value, the fact that the entire parsed derivative of
> the initial query is sent to match (if that's indeed what's happening)
> won't actually get any hits. Yet if I remove the quotes when sending over
> to query then the parsing doesn't get to a point of having any
> worthwhile/matching tokens to begin with.
>
> one last thing: i've attempted with just "vendor" being sent over to help
> remove complexity and, once more, i see Analysis chain functioning just
> fine but the query itself getting 0 hits.
>
> think TermComponents is the best option at this point or something else
> given the above filler info?
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | john@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > &debug=query is your friend. There are several issues that often trip
> > people up:
> >
> > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > all the way to the field in question. Trivial example:
> > I put (without quotes) "erick erickson" in the "name" field in the
> > analysis page and see that it gets tokenized correctly. But the query
> > "name:erick erickson" actually gets parsed at a higher level into
> > name:erick default_search_field:erickson. See the discussion at:
> > SOLR-9185
> >
> > 2> what you think is in your indexed field isn't really. Can happen if
> > you changed your analysis chain but didn't totally re-index. Can
> > happen because one of the parts of the analysis chain works
> > differently than you expect (WordDelimiterFilterFactory, for instance,
> > has a ton of options that can alter the tokens emitted). The
> > TermsComponent will let you examine the terms actually _in_ the index
> > that you search on. You stated that the analysis page shows you what
> > you expect, so this is a sanity check.
> >
> > 3> You're using edismax and setting some parameter, mm=100% is a
> > favorite and it's having this effect.
> >
> > So add debug=query and provide a sample document (or just a field) and
> > the schema definition for the field in question if you're still
> > stumped.
> >
> > Best,
> > Erick
> >
> > On Tue, Apr 11, 2017 at 8:35 AM, John Blythe <jo...@curvolabs.com> wrote:
> > > hi everyone.
> > >
> > > i recently wrote in ('analysis matching, query not') but never heard
> back
> > > so wanted to follow up. i'm at my wit's end currently. i have several
> > > fields that are showing matches in the analysis tab. when i dumb down
> the
> > > string sent over to query it still gives me issues in some field cases.
> > >
> > > any thoughts on how to debug to figure out wtf is going on here would
> be
> > > greatly appreciated. the use case is straightforward and the solution
> > > should be as well, so i'm at a loss as to how in the world i'm having
> > > issues w this.
> > >
> > > can provide any amount of contextualizing information you need, just
> let
> > me
> > > know what could be beneficial.
> > >
> > > best,
> > >
> > > john
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Re: simple matches not catching at query time

Posted by John Blythe <jo...@curvolabs.com>.
hi, erick.

appreciate the feedback.

1> i'm sending the terms to solr enquoted
2> i'd thought that at one point and reran the indexing. i _had_ had two of
the fields not indexed, but this represented one pass (same analyzer) from
two diff source fields while 2 or 3 of the other 4 fields _were_ seeming as
if they should match. maybe just need to do this for said sanity at this
point lol
3> i'm using dismax, no mm param set

some further context:

i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR US")
OR manufacturer_syn:("VENDOR:VENDOR US")...

The indexed value is: "Vendor"

the output of field 1 in the Analysis tab would be:
*index*: vendor_coolmed | coolmed | vendor
*query*: vendor_vendor_coolmed | vendor | vendor

the other field (and a couple other, related ones, actually) have similar
situations where I see a clear match (as well as get the confirmation of it
when switching to the old UI and seeing the highlighting) yet get no
results in my actual query.

a further note. when i get the query debugging enabled I can see this in
the output:
"filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
"manufacturer_split_syn:(\"Vendor:Vendor US\")"], "parsed_filter_queries":["
MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"],...

It looks as if the parsed query is wrapped in quotes even after having been
parsed, so while the correct tokens, i.e. "vendor", are present to match
against the indexed value, the fact that the entire parsed derivative of
the initial query is sent to match (if that's indeed what's happening)
won't actually get any hits. Yet if I remove the quotes when sending over
to query then the parsing doesn't get to a point of having any
worthwhile/matching tokens to begin with.

one last thing: i've attempted with just "vendor" being sent over to help
remove complexity and, once more, i see Analysis chain functioning just
fine but the query itself getting 0 hits.

think TermComponents is the best option at this point or something else
given the above filler info?

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | john@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson <er...@gmail.com>
wrote:

> &debug=query is your friend. There are several issues that often trip
> people up:
>
> 1> The analysis tab pre-supposes that what you put in the boxes gets
> all the way to the field in question. Trivial example:
> I put (without quotes) "erick erickson" in the "name" field in the
> analysis page and see that it gets tokenized correctly. But the query
> "name:erick erickson" actually gets parsed at a higher level into
> name:erick default_search_field:erickson. See the discussion at:
> SOLR-9185
>
> 2> what you think is in your indexed field isn't really. Can happen if
> you changed your analysis chain but didn't totally re-index. Can
> happen because one of the parts of the analysis chain works
> differently than you expect (WordDelimiterFilterFactory, for instance,
> has a ton of options that can alter the tokens emitted). The
> TermsComponent will let you examine the terms actually _in_ the index
> that you search on. You stated that the analysis page shows you what
> you expect, so this is a sanity check.
>
> 3> You're using edismax and setting some parameter, mm=100% is a
> favorite and it's having this effect.
>
> So add debug=query and provide a sample document (or just a field) and
> the schema definition for the field in question if you're still
> stumped.
>
> Best,
> Erick
>
> On Tue, Apr 11, 2017 at 8:35 AM, John Blythe <jo...@curvolabs.com> wrote:
> > hi everyone.
> >
> > i recently wrote in ('analysis matching, query not') but never heard back
> > so wanted to follow up. i'm at my wit's end currently. i have several
> > fields that are showing matches in the analysis tab. when i dumb down the
> > string sent over to query it still gives me issues in some field cases.
> >
> > any thoughts on how to debug to figure out wtf is going on here would be
> > greatly appreciated. the use case is straightforward and the solution
> > should be as well, so i'm at a loss as to how in the world i'm having
> > issues w this.
> >
> > can provide any amount of contextualizing information you need, just let
> me
> > know what could be beneficial.
> >
> > best,
> >
> > john
>

Re: simple matches not catching at query time

Posted by Erick Erickson <er...@gmail.com>.
&debug=query is your friend. There are several issues that often trip people up:

1> The analysis tab pre-supposes that what you put in the boxes gets
all the way to the field in question. Trivial example:
I put (without quotes) "erick erickson" in the "name" field in the
analysis page and see that it gets tokenized correctly. But the query
"name:erick erickson" actually gets parsed at a higher level into
name:erick default_search_field:erickson. See the discussion at:
SOLR-9185

2> what you think is in your indexed field isn't really. Can happen if
you changed your analysis chain but didn't totally re-index. Can
happen because one of the parts of the analysis chain works
differently than you expect (WordDelimiterFilterFactory, for instance,
has a ton of options that can alter the tokens emitted). The
TermsComponent will let you examine the terms actually _in_ the index
that you search on. You stated that the analysis page shows you what
you expect, so this is a sanity check.

3> You're using edismax and setting some parameter, mm=100% is a
favorite and it's having this effect.

So add debug=query and provide a sample document (or just a field) and
the schema definition for the field in question if you're still
stumped.

Best,
Erick

On Tue, Apr 11, 2017 at 8:35 AM, John Blythe <jo...@curvolabs.com> wrote:
> hi everyone.
>
> i recently wrote in ('analysis matching, query not') but never heard back
> so wanted to follow up. i'm at my wit's end currently. i have several
> fields that are showing matches in the analysis tab. when i dumb down the
> string sent over to query it still gives me issues in some field cases.
>
> any thoughts on how to debug to figure out wtf is going on here would be
> greatly appreciated. the use case is straightforward and the solution
> should be as well, so i'm at a loss as to how in the world i'm having
> issues w this.
>
> can provide any amount of contextualizing information you need, just let me
> know what could be beneficial.
>
> best,
>
> john