You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by geeky2 <ge...@hotmail.com> on 2012/02/08 04:52:44 UTC

struggling with solr.WordDelimiterFilterFactory and periods "." or dots

hello all,

i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
indicated in the solr book (Smiley) on page 54.

the example in the books reads like this:

>>
Here is an example exercising all options:
WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b
<<

essentially - i have the same requirement with embedded periods and need to
return a successful search on a field, even if the user does NOT enter the
period.

i have a field, itemNo that can contain periods ".".

example content in the itemNo field:

B12.0123

when the user searches on this field, they need to be able to enter an
itemNo without the period, and still find the item.

example:

user enters: B120123 and a document is returned with B12.0123.


unfortunately, the search will NOT return the appropriate document, if the
user enters B120123.

however - the search does work if the user enters B12 0123 (a space in place
of the period).

can someone help me understand what is missing from my configuration?


this is snipped from my schema.xml file


  <fields>
     ...
    <field name="itemNo" type="text" indexed="true" stored="true"/>
     ...
  </fields>




    <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        *<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>*
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>




--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3724822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by geeky2 <ge...@hotmail.com>.
hello,

>>
Or does your field in schema.xml have anything like
autoGeneratePhraseQueries="true" in it?
<<

there is no reference to this in our production schema.

this is extremely confusing.

i am not completely clear on the issue?

reviewing our previous messages - it looks like the data is being tokenized
correctly according to the analysis page and output from Luke.

it also looks like the definition of the field and field type is correct in
the schema.xml

it also looks like there is no errant data (quotes) being introduced in to
the query string submitted to solr:

example:

*http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select?indent=on&version=2.2&q=itemNo%3ABP21UAA&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=*

*so - does the real issue reside in HOW the query is being contructed /
parsed ???

and if so - what drives this query to become a MultiPhraseQuery with
embedded quotes ????
*

<lst name="debug"><str name="rawquerystring">itemNo:BP21UAA
</str><str name="querystring">itemNo:BP21UAA
</str><str name="parsedquery">MultiPhraseQuery(itemNo:"bp 21 (uaa
bp21uaa)")</str><str name="parsedquery_toString">itemNo:"bp 21 (uaa
bp21uaa)"</str>

please note - i also mocked up a simple test on my personal linux box - just
using the solr 3.5 distro (we are using 3.3.0 on our production box under
centOS)

i was able to get a simple test to work and yes - my query does look
different

output from my simple mock up on my personal box:

*http://localhost:8983/solr/select?indent=on&version=2.2&q=manu%3ABP21UAA&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=*

<lst name="debug"><str name="rawquerystring">manu:BP21UAA</str><str
name="querystring">manu:BP21UAA</str><str name="parsedquery">manu:bp manu:21
manu:uaa manu:bp21uaa</str><str name="parsedquery_toString">manu:bp manu:21
manu:uaa manu:bp21uaa</str><lst name="explain">

schema.xml

<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100"><analyzer type="index"><tokenizer
class="solr.WhitespaceTokenizerFactory"/><filter
class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt"
enablePositionIncrements="true"/><filter
class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1" splitOnCaseChange="1"/><filter
class="solr.LowerCaseFilterFactory"/><filter
class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/><filter
class="solr.PorterStemFilterFactory"/></analyzer><analyzer
type="query"><tokenizer class="solr.WhitespaceTokenizerFactory"/><filter
class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt" enablePositionIncrements="true"/><filter
class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="1" splitOnCaseChange="1"/><filter
class="solr.LowerCaseFilterFactory"/><filter
class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/><filter
class="solr.PorterStemFilterFactory"/></analyzer></fieldType>

<field name="manu" type="text_en_splitting" indexed="true" stored="true"
omitNorms="true"/>

any suggestions would be greatly appreciated.

mark




--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3733486.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, Try looking at either anything you've done in solrconfig.xml
where to the request handler (probably called "search") with
default="true" set.

Or does your field in schema.xml have anything like
autoGeneratePhraseQueries="true" in it?

Best
Erick

On Thu, Feb 9, 2012 at 12:02 PM, geeky2 <ge...@hotmail.com> wrote:
>
>>>
> OK, first question is why are you searching on two different values?
> Is that intentional?
> <<
>
> yes - our users have to be able to locate a part or model number (that may
> or may not have periods in that number) even if they do NOT enter the number
> with the embedded periods.
>
> example:
>
> actual part number in our database is BP2.1UAA
>
> however the user needs to be able to search on BP21UAA and find that part.
>
> there are business reason why a user may see something different in the
> field then is actually in the database.
>
> does this make sense?
>
>
>
>>>
> If I'm reading your problem right, you should
> be able to get/not get any response just by toggling whether the
> period is in the search URL, right?
> <<
>
> yes - simply put - the user MUST get a hit on the above mentioned part if
> they enter BP21UAA or BP2.1UAA.
>
>>>
> But assuming that's not the problem, there's something you're
> not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
> <<
>
> sorry - i did not know i was doing this or how it happened - it was not
> intentional and i did not notice this until your posting.  i am not sure of
> the implications related to this or what it means to have something as a
> MultiPhraseQuery.
>
>>>
> Are you putting quotes in somehow, either through the URL or by
> something in your solrconfig.xml?
> <<
>
> i did not use quotes in the url - i cut and pasted the urls for my tests in
> the message thread.  i do not see quotes as part of the url in my previous
> post.
>
> what would i be looking for in the solrconfig.xml file that would force the
> MultiPhraseQuery?
>
> it seems that this is the crux of the issue - but i am not sure how to
> determine what is manifesting the quotes?  as previously stated - the quotes
> are not being entered via the url - they are pasted (in this message thread)
> exactly as i pulled them from the browser.
>
> thank you,
> mark
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by geeky2 <ge...@hotmail.com>.
>>
OK, first question is why are you searching on two different values?
Is that intentional? 
<<

yes - our users have to be able to locate a part or model number (that may
or may not have periods in that number) even if they do NOT enter the number
with the embedded periods.  

example: 

actual part number in our database is BP2.1UAA

however the user needs to be able to search on BP21UAA and find that part.

there are business reason why a user may see something different in the
field then is actually in the database.

does this make sense?



>>
If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right? 
<<

yes - simply put - the user MUST get a hit on the above mentioned part if
they enter BP21UAA or BP2.1UAA.

>>
But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
<<

sorry - i did not know i was doing this or how it happened - it was not
intentional and i did not notice this until your posting.  i am not sure of
the implications related to this or what it means to have something as a
MultiPhraseQuery.

>>
Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?
<<

i did not use quotes in the url - i cut and pasted the urls for my tests in
the message thread.  i do not see quotes as part of the url in my previous
post.

what would i be looking for in the solrconfig.xml file that would force the
MultiPhraseQuery?

it seems that this is the crux of the issue - but i am not sure how to
determine what is manifesting the quotes?  as previously stated - the quotes
are not being entered via the url - they are pasted (in this message thread)
exactly as i pulled them from the browser.

thank you,
mark





--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by Erick Erickson <er...@gmail.com>.
OK, first question is why are you searching on two different values?
Is that intentional? If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right?

But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?

Because this works fine for me, using your schema definition and
without using quotes. I get, however, this as the parsed query:
eoe:b eoe:12 eoe:0123 eoe:120123 eoe:b120123
not a phrase in sight.

If I *do* put quotes around the version without the period, I get
no results returned and a MultiPhraseQuery.

Best
Erick



On Wed, Feb 8, 2012 at 11:54 AM, geeky2 <ge...@hotmail.com> wrote:
> hello,
>
> thanks for sticking with me on this ...very frustrating
>
> ok - i did perform the query with the debug parms using two scenarios:
>
> 1) a successful search (where i insert the period / dot) in to the itemNo
> field and the search returns a document.
>
> itemNo:BP2.1UAA
>
> http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on
>
> results from debug
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">1</int>
>  <lst name="params">
>    <str name="indent">on</str>
>    <str name="rows">10</str>
>
>    <str name="version">2.2</str>
>    <str name="debugQuery">on</str>
>    <str name="start">0</str>
>    <str name="q">itemNo:BP2.1UAA</str>
>  </lst>
> </lst>
> <result name="response" numFound="1" start="0">
>  <doc>
>
>    <arr name="brand"><str>PHILIPS</str></arr>
>    <str name="groupId">0333500</str>
>    <str name="id">0333500,1549  ,BP2.1UAA                           </str>
>    <str name="itemDesc">PLASMA TELEVISION</str>
>    <str name="itemNo">BP2.1UAA                           </str>
>    <int name="itemType">2</int>
>
>    <arr name="model"><str>BP2.1UAA                           </str></arr>
>    <arr name="productType"><str>Plasma Television^</str></arr>
>    <int name="rankNo">0</int>
>    <str name="supplierId">1549  </str>
>  </doc>
> </result>
> <lst name="debug">
>  <str name="rawquerystring">itemNo:BP2.1UAA</str>
>
>  <str name="querystring">itemNo:BP2.1UAA</str>
>  <str name="parsedquery">MultiPhraseQuery(itemNo:"bp 2 (1 21) (uaa
> bp21uaa)")</str>
>  <str name="parsedquery_toString">itemNo:"bp 2 (1 21) (uaa bp21uaa)"</str>
>  <lst name="explain">
>    <str name="0333500,1549  ,BP2.1UAA                           ">
> 22.539911 = (MATCH) weight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in 134993),
> product of:
>  0.99999994 = queryWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)"), product of:
>    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
>    0.02218287 = queryNorm
>  22.539913 = (MATCH) fieldWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in
> 134993), product of:
>    1.0 = tf(phraseFreq=1.0)
>    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
>    0.5 = fieldNorm(field=itemNo, doc=134993)
> </str>
>  </lst>
>
>  <str name="QParser">LuceneQParser</str>
>  <lst name="timing">
>    <double name="time">1.0</double>
>    <lst name="prepare">
>      <double name="time">0.0</double>
>      <lst name="org.apache.solr.handler.component.QueryComponent">
>        <double name="time">0.0</double>
>
>      </lst>
>      <lst name="org.apache.solr.handler.component.FacetComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.HighlightComponent">
>
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.StatsComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.DebugComponent">
>        <double name="time">0.0</double>
>
>      </lst>
>    </lst>
>    <lst name="process">
>      <double name="time">1.0</double>
>      <lst name="org.apache.solr.handler.component.QueryComponent">
>        <double name="time">1.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.FacetComponent">
>
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.HighlightComponent">
>        <double name="time">0.0</double>
>
>      </lst>
>      <lst name="org.apache.solr.handler.component.StatsComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.DebugComponent">
>        <double name="time">0.0</double>
>      </lst>
>    </lst>
>
>  </lst>
> </lst>
> </response>
>
>
>
>
>
>
>
> 2) a NON-successful search (where i do NOT insert a period / dot) in to the
> itemNo field and the search does NOT return a document
>
>  itemNo:BP21UAA
>
> http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">1</int>
>  <lst name="params">
>    <str name="indent">on</str>
>    <str name="rows">10</str>
>
>    <str name="version">2.2</str>
>    <str name="debugQuery">on</str>
>    <str name="start">0</str>
>    <str name="q">itemNo:BP21UAA</str>
>  </lst>
> </lst>
> <result name="response" numFound="0" start="0"/>
> <lst name="debug">
>
>  <str name="rawquerystring">itemNo:BP21UAA</str>
>  <str name="querystring">itemNo:BP21UAA</str>
>  <str name="parsedquery">MultiPhraseQuery(itemNo:"bp 21 (uaa
> bp21uaa)")</str>
>  <str name="parsedquery_toString">itemNo:"bp 21 (uaa bp21uaa)"</str>
>  <lst name="explain"/>
>  <str name="QParser">LuceneQParser</str>
>
>  <lst name="timing">
>    <double name="time">1.0</double>
>    <lst name="prepare">
>      <double name="time">1.0</double>
>      <lst name="org.apache.solr.handler.component.QueryComponent">
>        <double name="time">1.0</double>
>      </lst>
>
>      <lst name="org.apache.solr.handler.component.FacetComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.HighlightComponent">
>        <double name="time">0.0</double>
>
>      </lst>
>      <lst name="org.apache.solr.handler.component.StatsComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.DebugComponent">
>        <double name="time">0.0</double>
>      </lst>
>    </lst>
>
>    <lst name="process">
>      <double name="time">0.0</double>
>      <lst name="org.apache.solr.handler.component.QueryComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.FacetComponent">
>        <double name="time">0.0</double>
>
>      </lst>
>      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.HighlightComponent">
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.StatsComponent">
>
>        <double name="time">0.0</double>
>      </lst>
>      <lst name="org.apache.solr.handler.component.DebugComponent">
>        <double name="time">0.0</double>
>      </lst>
>    </lst>
>  </lst>
> </lst>
>
> </response>
>
> the parsedquery part of the debug ouput looks like it DOES contain the term
> that i am entering for my search criteria on the itemNo field ??
>
> does this make sense?
>
> thank you,
> mark
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726614.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by geeky2 <ge...@hotmail.com>.
hello,

thanks for sticking with me on this ...very frustrating 

ok - i did perform the query with the debug parms using two scenarios:

1) a successful search (where i insert the period / dot) in to the itemNo
field and the search returns a document.

itemNo:BP2.1UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on

results from debug

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="indent">on</str>
    <str name="rows">10</str>

    <str name="version">2.2</str>
    <str name="debugQuery">on</str>
    <str name="start">0</str>
    <str name="q">itemNo:BP2.1UAA</str>
  </lst>
</lst>
<result name="response" numFound="1" start="0">
  <doc>

    <arr name="brand"><str>PHILIPS</str></arr>
    <str name="groupId">0333500</str>
    <str name="id">0333500,1549  ,BP2.1UAA                           </str>
    <str name="itemDesc">PLASMA TELEVISION</str>
    <str name="itemNo">BP2.1UAA                           </str>
    <int name="itemType">2</int>

    <arr name="model"><str>BP2.1UAA                           </str></arr>
    <arr name="productType"><str>Plasma Television^</str></arr>
    <int name="rankNo">0</int>
    <str name="supplierId">1549  </str>
  </doc>
</result>
<lst name="debug">
  <str name="rawquerystring">itemNo:BP2.1UAA</str>

  <str name="querystring">itemNo:BP2.1UAA</str>
  <str name="parsedquery">MultiPhraseQuery(itemNo:"bp 2 (1 21) (uaa
bp21uaa)")</str>
  <str name="parsedquery_toString">itemNo:"bp 2 (1 21) (uaa bp21uaa)"</str>
  <lst name="explain">
    <str name="0333500,1549  ,BP2.1UAA                           ">
22.539911 = (MATCH) weight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in 134993),
product of:
  0.99999994 = queryWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)"), product of:
    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
    0.02218287 = queryNorm
  22.539913 = (MATCH) fieldWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in
134993), product of:
    1.0 = tf(phraseFreq=1.0)
    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
    0.5 = fieldNorm(field=itemNo, doc=134993)
</str>
  </lst>

  <str name="QParser">LuceneQParser</str>
  <lst name="timing">
    <double name="time">1.0</double>
    <lst name="prepare">
      <double name="time">0.0</double>
      <lst name="org.apache.solr.handler.component.QueryComponent">
        <double name="time">0.0</double>

      </lst>
      <lst name="org.apache.solr.handler.component.FacetComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.HighlightComponent">

        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.StatsComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.DebugComponent">
        <double name="time">0.0</double>

      </lst>
    </lst>
    <lst name="process">
      <double name="time">1.0</double>
      <lst name="org.apache.solr.handler.component.QueryComponent">
        <double name="time">1.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.FacetComponent">

        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.HighlightComponent">
        <double name="time">0.0</double>

      </lst>
      <lst name="org.apache.solr.handler.component.StatsComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.DebugComponent">
        <double name="time">0.0</double>
      </lst>
    </lst>

  </lst>
</lst>
</response>







2) a NON-successful search (where i do NOT insert a period / dot) in to the
itemNo field and the search does NOT return a document

 itemNo:BP21UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="indent">on</str>
    <str name="rows">10</str>

    <str name="version">2.2</str>
    <str name="debugQuery">on</str>
    <str name="start">0</str>
    <str name="q">itemNo:BP21UAA</str>
  </lst>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="debug">

  <str name="rawquerystring">itemNo:BP21UAA</str>
  <str name="querystring">itemNo:BP21UAA</str>
  <str name="parsedquery">MultiPhraseQuery(itemNo:"bp 21 (uaa
bp21uaa)")</str>
  <str name="parsedquery_toString">itemNo:"bp 21 (uaa bp21uaa)"</str>
  <lst name="explain"/>
  <str name="QParser">LuceneQParser</str>

  <lst name="timing">
    <double name="time">1.0</double>
    <lst name="prepare">
      <double name="time">1.0</double>
      <lst name="org.apache.solr.handler.component.QueryComponent">
        <double name="time">1.0</double>
      </lst>

      <lst name="org.apache.solr.handler.component.FacetComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.HighlightComponent">
        <double name="time">0.0</double>

      </lst>
      <lst name="org.apache.solr.handler.component.StatsComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.DebugComponent">
        <double name="time">0.0</double>
      </lst>
    </lst>

    <lst name="process">
      <double name="time">0.0</double>
      <lst name="org.apache.solr.handler.component.QueryComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.FacetComponent">
        <double name="time">0.0</double>

      </lst>
      <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.HighlightComponent">
        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.StatsComponent">

        <double name="time">0.0</double>
      </lst>
      <lst name="org.apache.solr.handler.component.DebugComponent">
        <double name="time">0.0</double>
      </lst>
    </lst>
  </lst>
</lst>

</response>

the parsedquery part of the debug ouput looks like it DOES contain the term
that i am entering for my search criteria on the itemNo field ??

does this make sense?

thank you,
mark



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726614.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, that all looks correct, from the output you pasted I'd expect
you to be finding the doc.

So next thing: add &debugQuery=on to your query and look at
the debug information after the list of documents, particularly
the "parsedQuery" bit. Are you searching against the fields you
think you are? If you don't specify a field, Solr uses the default
defined in schema.xml.

Next, look at your actual index using either Luke or the TemsComponent
to see what's actually *in* your index rather than what you *think* is. I
can't tell you how many times I've made the wrong assumptions.

My guess would be that you aren't searching the fields you think you are...

Best
Erick

On Wed, Feb 8, 2012 at 9:06 AM, geeky2 <ge...@hotmail.com> wrote:
> hello,
>
> thank you for the reply.
>
> yes - i did re-index after the changes to the schema.
>
> also - thank you for the direction on using the analyzer - but i am not sure
> if i am interpreting the feedback from the analyzer correctly.
>
> here is what i did:
>
> in the Field value (Index) box - i placed this: BP2.1UAA
>
> in the Field value (Query) box - i placed this: BP21UAA
>
> then after hitting the Analyze button - i see the following:
>
> Under Index Analyzer for:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
> generateWordParts=1, catenateAll=1, catenateNumbers=1}
>
> i see
>
> position        1       2       3       4
> term text       BP      2       1       UAA
> 21      BP21UAA
>
> Under Query Analyzer for:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
> generateWordParts=1, catenateAll=1, catenateNumbers=1}
>
> i see
>
> position        1       2       3
> term text       BP      21      UAA
> BP21UAA
>
> the above information leads me to believe that i "should" have BP21UAA as an
> indexed term generated from the BP2.1UAA value coming from the database.
>
> also - the query analysis lead me to believe that i "should" find a document
> when i search on BP21UAA in the itemNo field
>
> do i have this correct????
>
> am i missing something here?
>
> i am still unable to get a hit when i search on BP21UAA in the itemNo field.
>
> thank you,
> mark
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by geeky2 <ge...@hotmail.com>.
hello,

thank you for the reply.

yes - i did re-index after the changes to the schema.

also - thank you for the direction on using the analyzer - but i am not sure
if i am interpreting the feedback from the analyzer correctly.

here is what i did:

in the Field value (Index) box - i placed this: BP2.1UAA

in the Field value (Query) box - i placed this: BP21UAA

then after hitting the Analyze button - i see the following:

Under Index Analyzer for: 

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position 	1	2	3	4
term text 	BP	2	1	UAA
21	BP21UAA

Under Query Analyzer for:

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position 	1	2	3
term text 	BP	21	UAA
BP21UAA

the above information leads me to believe that i "should" have BP21UAA as an
indexed term generated from the BP2.1UAA value coming from the database.

also - the query analysis lead me to believe that i "should" find a document
when i search on BP21UAA in the itemNo field

do i have this correct????

am i missing something here?

i am still unable to get a hit when i search on BP21UAA in the itemNo field.

thank you,
mark

--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, seems OK. Did you re-index after any
schema changes?

You'll learn to love admin/analysis for questions like this,
that page should show you what the actual tokenization
results are, make sure to click the "verbose" check boxes.

Best
Erick

On Tue, Feb 7, 2012 at 10:52 PM, geeky2 <ge...@hotmail.com> wrote:
> hello all,
>
> i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
> indicated in the solr book (Smiley) on page 54.
>
> the example in the books reads like this:
>
>>>
> Here is an example exercising all options:
> WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b
> <<
>
> essentially - i have the same requirement with embedded periods and need to
> return a successful search on a field, even if the user does NOT enter the
> period.
>
> i have a field, itemNo that can contain periods ".".
>
> example content in the itemNo field:
>
> B12.0123
>
> when the user searches on this field, they need to be able to enter an
> itemNo without the period, and still find the item.
>
> example:
>
> user enters: B120123 and a document is returned with B12.0123.
>
>
> unfortunately, the search will NOT return the appropriate document, if the
> user enters B120123.
>
> however - the search does work if the user enters B12 0123 (a space in place
> of the period).
>
> can someone help me understand what is missing from my configuration?
>
>
> this is snipped from my schema.xml file
>
>
>  <fields>
>     ...
>    <field name="itemNo" type="text" indexed="true" stored="true"/>
>     ...
>  </fields>
>
>
>
>
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        *<filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>*
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3724822.html
> Sent from the Solr - User mailing list archive at Nabble.com.