You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by vicky desai <vi...@germinait.com> on 2013/08/16 13:23:27 UTC

struggling with solr.WordDelimiterFilterFactory

Hi All,

I have a query regarding the use of wordDelimiterFilterFactory.  My schema
definition for the text field is as follows

	<fieldType name="text" class="solr.TextField"
			positionIncrementGap="100">
			<analyzer>
				<tokenizer class="solr.WhitespaceTokenizerFactory" />
				<filter class="solr.WordDelimiterFilterFactory"
					splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1"
catenateWords="1"
					catenateNumbers="1" catenateAll="1"  preserveOriginal="1"/>
				<filter class="solr.LowerCaseFilterFactory" />
			</analyzer>
		</fieldType>

<field name="Content" type="text" indexed="true" stored="true"
multiValued="false"/>

If I make the following query q=Content:speedPost

then docs having Content *speed post *are matched which is as expected but
docs having Content *speedpost* do not match.

Can anybody please highlight if I am going incorrect somewhere



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Jack Krupansky <ja...@basetechnology.com>.

Have you made ANY changes to the analyzer since indexing the data? 
Generally, you need to completely reindex your data after any changes to a 
field type analyzer.

Otherwise, run the Solr Admin UI Analyzer web page and check the output for 
both index and query.

Also, be aware that preserveOriginal will preserve index-time punctuation 
such as trailing comma, period, or enclosing parentheses.

Also, what is your default query operator? You need to use q.op=OR when 
using WDF to generate multiple, non-phrase terms at query time.

Also add debugQuery=true to your request and see what the generated parse 
query looks like.

-- Jack Krupansky

-----Original Message----- 
From: vicky desai
Sent: Friday, August 16, 2013 7:51 AM
To: solr-user@lucene.apache.org
Subject: Re: struggling with solr.WordDelimiterFilterFactory

Hi,

Another Example I found is q=Content:wi-fi doesn't match for documents with
word wifi. I think it is not catenating the query keywords correctly



--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085030.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Jack Krupansky <ja...@basetechnology.com>.

I just wanted other readers to be clear about what can be made to work quite 
easily.

As to "speedPost" as a quoted phrase, that's a different beast entirely due 
to the semantics of phrases, which is that they are an implicit "AND" 
operator - all tokens must match - and you and your users must be aware that 
camel case is generating multiple tokens. So, ALL of the generated WDF 
tokens must match - which does in fact happen if the content has the term 
"speedPost", but not in the case of content only containing "speedpost".

The WDF does not have any magic and cannot make all cases work. It's up to 
you, the Solr app developer to decide which cases have the highest priority 
for you and then to accept the cases that won't work given your priorities.

Maybe that caveat wasn't made clearly enough for you early enough on.

And if really need to get an absolute 100% of all cases, which most Solr 
applications do not, you will need to do application-specific query 
filtering in your application layer in front of Solr.

-- Jack Krupansky

-----Original Message----- 
From: vicky desai
Sent: Tuesday, August 20, 2013 8:28 AM
To: solr-user@lucene.apache.org
Subject: Re: struggling with solr.WordDelimiterFilterFactory

Hi Jack,

As mentioned earliear a part of the issue was resolved by the two fixes I
mentioned above and for the query u mentioned I am getting the same result
as yours.
What is not working though is the query *q=content:"speedPost"* with the
text enclosed in inverted commas



--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi Jack,

As mentioned earliear a part of the issue was resolved by the two fixes I
mentioned above and for the query u mentioned I am getting the same result
as yours.
What is not working though is the query *q=content:"speedPost"* with the
text enclosed in inverted commas



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi Jack,

Thanks for the expalnation



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085661.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Jack Krupansky <ja...@basetechnology.com>.

Just to be clear for other readers, if you have "speedpost" in the index and 
you query "speedPost" using the "OR" operator and the WDF set to "catenate 
all", and use lower case filter, the query should work fine. If it fails in 
your case, well, maybe there is something else wrong... somewhere.

I tried this with the standard 4.4 example schema, adding this field type:

<fieldType name="text_wdf" class="solr.TextField" 
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1"
catenateWords="1"
catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>

and adding this field:

<field name="wdf_text" type="text_wdf" indexed="true" stored="true" 
multiValued="false"/>

And indexing this data:

curl "http://localhost:8983/solr/update?commit=true" \
-H 'Content-type:application/json' -d '
[{"id": "doc-1", "wdf_text": "This is the speedpost case."},
{"id": "doc-2", "wdf_text": "This is the speed post case."},
{"id": "doc-3", "wdf_text": "This is the speedPost case."},
{"id": "doc-4", "wdf_text": "This is the SpeedPost case."},
{"id": "doc-5", "wdf_text": "This is the Speed Post case."}]'

And this query:

curl 
"http://localhost:8983/solr/select/?q=speedpost&df=wdf_text&indent=true&wt=json"

Returns the first, third, and fourth docs, as expected.

And this query:

curl 
"http://localhost:8983/solr/select/?q=speedPost&df=wdf_text&indent=true&wt=json"

Returns all five docs, as expected.

Note: the default for q.op is "OR".

So, please try the same experiment yourself, and then tell us how your 
config/schema is different than this test case.

-- Jack Krupansky

-----Original Message----- 
From: vicky desai
Sent: Tuesday, August 20, 2013 7:50 AM
To: solr-user@lucene.apache.org
Subject: Re: struggling with solr.WordDelimiterFilterFactory

Hi Erik,

I was going to come to that. Now if I have the word *speedpost* in the index
and if I dont use catenation at the query end then query for the word
speedPost wont fetch me the results. It would then might make sense to
remove the entire WDFF from query and search for a few possible combinations
to fina all matching docs



--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085650.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi Erik,

I was going to come to that. Now if I have the word *speedpost* in the index
and if I dont use catenation at the query end then query for the word
speedPost wont fetch me the results. It would then might make sense to
remove the entire WDFF from query and search for a few possible combinations
to fina all matching docs



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085650.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Erick Erickson <er...@gmail.com>.

OK, here's where you can drive yourself mad with the subtle
variations between how WordDelimiterFilterFactory splits and
recombines the tokens. Take a look at the stock distro, you'll
see that the index time and query time settings for WEFF
are slightly different.

The idea is that if you do things like add the split terms and the
concatenated term to the index you may not need to add them
all the same way at query time.

Now that you have the basic bits operating, try using the
different settings in the <index> and <query> sections of
your field as per the example.

Best
Erick

On Tue, Aug 20, 2013 at 3:44 AM, vicky desai <vi...@germinait.com>wrote:

> Hi All,
>
> There were two fixes for the issue I was facing
> 1. By changing the version in schema form* 1.1* to *1.5*
> OR
> 2. keeping the version to 1.1 and adding
> *autoGeneratePhraseQueries*="false"
> to the field type
>
> However the issue is not completely resolved yet
> on searching for content:speedPost the output of debug query is as follows
> <str name="parsedquery_toString">cContent:speedpost cContent:speed
> cContent:post cContent:speedpost</str>
>
> But If i search for content:"speedPost" the output of the debug query is as
> follows
> <str name="parsedquery_toString">cContent:"(speedpost speed) (post
> speedpost)"</str>
>
> This gives incorrect results
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085605.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi All,

There were two fixes for the issue I was facing
1. By changing the version in schema form* 1.1* to *1.5*
OR
2. keeping the version to 1.1 and adding *autoGeneratePhraseQueries*="false"
to the field type

However the issue is not completely resolved yet
on searching for content:speedPost the output of debug query is as follows
<str name="parsedquery_toString">cContent:speedpost cContent:speed
cContent:post cContent:speedpost</str>

But If i search for content:"speedPost" the output of the debug query is as
follows
<str name="parsedquery_toString">cContent:"(speedpost speed) (post
speedpost)"</str>

This gives incorrect results



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085605.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi Aloke,

After taking the schema.xml and solrconfig.xml with the changes u mentioned
it worked fine. However simply making this changes in schema.xml doesnt
work. So seems like there is an issue in some configuration in
solrconfig.xml. I will figure that out and post it here.

Anyways thanks a lot to every1 for being patient enough and resolving my
query. 

Regards,
Vicky



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085447.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Aloke Ghoshal <al...@gmail.com>.

Location of the schema.xml:
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/collection1/conf/schema.xml


On Mon, Aug 19, 2013 at 6:52 PM, Aloke Ghoshal <al...@gmail.com> wrote:

> Here you go, it is the default 4.2.1 schema.xml (
> http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/solr.xml),
> with the following additions:
>
> <!-- Added these fields -->
> <field name="Content" type="text_general" indexed="true" stored="true"
> multiValued="false"/>
> <field name="ContTest" type="text_general" indexed="true" stored="true"
> multiValued="false"/>
>
> <!-- Changed this fieldType -->
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>
>     <analyzer>
>             <tokenizer class="solr.WhitespaceTokenizerFactory" />
>             <filter class="solr.WordDelimiterFilterFactory"
> splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1"
> preserveOriginal="1"/>
>             <filter class="solr.LowerCaseFilterFactory" />
>         </analyzer>
>     </fieldType>
>
>
> Test with the field *ContTest*.
>
> Regards,
> Aloke
>
>
> On Mon, Aug 19, 2013 at 6:36 PM, vicky desai <vi...@germinait.com>wrote:
>
>> Hi Aloke,
>>
>> I have multiple fields in my schema which are of type text.  i tried the
>> same case on all the fields. Not working for me on any of them.  If
>> possible
>> for u can u please post your dummy solrconfig.xml and schema.xml. I can
>> replace them and check
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085432.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Aloke Ghoshal <al...@gmail.com>.

Here you go, it is the default 4.2.1 schema.xml (
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/solr.xml),
with the following additions:

<!-- Added these fields -->
<field name="Content" type="text_general" indexed="true" stored="true"
multiValued="false"/>
<field name="ContTest" type="text_general" indexed="true" stored="true"
multiValued="false"/>

<!-- Changed this fieldType -->
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
    <analyzer>
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="1"
preserveOriginal="1"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>


Test with the field *ContTest*.

Regards,
Aloke


On Mon, Aug 19, 2013 at 6:36 PM, vicky desai <vi...@germinait.com>wrote:

> Hi Aloke,
>
> I have multiple fields in my schema which are of type text.  i tried the
> same case on all the fields. Not working for me on any of them.  If
> possible
> for u can u please post your dummy solrconfig.xml and schema.xml. I can
> replace them and check
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085432.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi Aloke,

I have multiple fields in my schema which are of type text.  i tried the
same case on all the fields. Not working for me on any of them.  If possible
for u can u please post your dummy solrconfig.xml and schema.xml. I can
replace them and check



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085432.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Aloke Ghoshal <al...@gmail.com>.

Hi Vicky,

Please check you if you have  a second "multiValued" field by the name
"content" defined in your schema.xml. It is typically part of the default
schema definition & is different from the one you had initially posted had
"Content" with a capital C.

Here's the debugQuery on my system (with both fields co-existing in the
schema.xml & mapped to exactly the same fieldType definition given by you
above):

1. *content: speedPost*
<str name="rawquerystring">content:speedPost</str>
  <str name="querystring">content:speedPost</str>
  <str name="parsedquery">MultiPhraseQuery(content:"(speedpost speed) (post
speedpost)")</str>
  <str name="parsedquery_toString">content:"(speedpost speed) (post
speedpost)"</str>

Vs.

2.* Content:speedPost:*
<str name="rawquerystring">Content:speedPost</str>
  <str name="querystring">Content:speedPost</str>
  <str name="parsedquery">Content:speedpost Content:speed Content:post
Content:speedpost</str>
  <str name="parsedquery_toString">Content:speedpost Content:speed
Content:post Content:speedpost</str>

Also as Erick mentioned both examples work fine for me as well.

Regards,
Aloke

On Mon, Aug 19, 2013 at 5:34 PM, vicky desai <vi...@germinait.com>wrote:

> Hi,
>
> Another observation while testing
>
> Docs having the value for content field as below
> 1. content:speedPost
> 2. content:sPeedpost
> 3. content:speEdpost
> 4. content:speedposT
>
> matches the query q=content:speedPost. So basically if in the entire word
> there is one 1 letter that is camel cased then it matches the query.
> however
> content:speedpost with all letters lowercase is not found to be a match
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085421.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi,

Another observation while testing

Docs having the value for content field as below
1. content:speedPost
2. content:sPeedpost
3. content:speEdpost
4. content:speedposT

matches the query q=content:speedPost. So basically if in the entire word
there is one 1 letter that is camel cased then it matches the query. however
content:speedpost with all letters lowercase is not found to be a match



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085421.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi Erik,

These are the request handlers defined in solrconfig.xml 

        <requestHandler name="/analysis/field"
class="solr.FieldAnalysisRequestHandler" />
        <requestHandler name="standard" class="solr.StandardRequestHandler"
default="true" />
        <requestHandler name="/update" class="solr.UpdateRequestHandler" />
        <requestHandler name="/admin/"
class="org.apache.solr.handler.admin.AdminHandlers" />
        <requestHandler name="/replication" class="solr.ReplicationHandler"
/>



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085417.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Erick Erickson <er...@gmail.com>.

Well, the case of your parsedQuery field _name_ (i.e. content) does not
match
the case of your field definition, (i.e. Content). This may just be an
artifact
however.

That said, the MultiPhraseQuery is probably coming from your request
handler definition. Can we see that too?

Erick


On Mon, Aug 19, 2013 at 6:01 AM, vicky desai <vi...@germinait.com>wrote:

> Hi,
>
> I have created a new index. So reindexing shouldnt be the issue.
> Analysis page shows me correct result and match should be found as per the
> analysis page.But no output on actual query
>
> The Output of debug query is as follows
> <str name="rawquerystring">content:speedPost</str>
> <str name="querystring">content:speedPost</str>
> <str name="parsedquery">MultiPhraseQuery(content:"(speedpost speed) (post
> speedpost)")</str>
> <str name="parsedquery_toString">content:"(speedpost speed) (post
> speedpost)"</str>
>
> I dont understand the output for MultiPhraseQuery. Can anyone suggest a
> good
> read for the same.
> Erik - I m searching for the correct field name. But still no output
>
> One suprising fact is if my index word is speedPost and I query for
> speedpost I find a match but vice versa doesnt work
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085405.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi,

I have created a new index. So reindexing shouldnt be the issue.
Analysis page shows me correct result and match should be found as per the
analysis page.But no output on actual query

The Output of debug query is as follows
<str name="rawquerystring">content:speedPost</str>
<str name="querystring">content:speedPost</str>
<str name="parsedquery">MultiPhraseQuery(content:"(speedpost speed) (post
speedpost)")</str>
<str name="parsedquery_toString">content:"(speedpost speed) (post
speedpost)"</str>

I dont understand the output for MultiPhraseQuery. Can anyone suggest a good
read for the same.
Erik - I m searching for the correct field name. But still no output

One suprising fact is if my index word is speedPost and I query for
speedpost I find a match but vice versa doesnt work



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085405.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Erick Erickson <er...@gmail.com>.

Vicky:

Both examples work fine for me. Can you show us the results of adding
&debug=query?
It does NOT work if I search content:speedPost rather than
Content:speedPost though,
are you sure the casing of your field name case matches exactly?

Also, be sure to re-index your data. Use the admin/analysis page to see
what the transformations
are. Also, take a look at admin/schema browser to see what's actually _in_
your inde.

What does the admin/analysis page show?

And, BTW, it's somewhat inefficient to have the exact same analyzer in both
cases though
that's a fine place to start. If you have  'speed' 'post' and 'speedpost'
in your index, there's
no need to catenate them all back in the query, that's why the default WDFF
is set up
the way it is. But I'd only try refining it after I figured out what's
wrong with your setup though.

FWIW,
Erick

On Fri, Aug 16, 2013 at 8:38 AM, Aloke Ghoshal <al...@gmail.com> wrote:

> Hi,
>
> That's correct the Analyzers will get applied to both Index & Query time.
> In fact I do get results back for speedPost with this field definition.
>
> Regards,
> Aloke
>
>
> On Fri, Aug 16, 2013 at 5:21 PM, vicky desai <vicky.desai@germinait.com
> >wrote:
>
> > Hi,
> >
> > Another Example I found is q=Content:wi-fi doesn't match for documents
> with
> > word wifi. I think it is not catenating the query keywords correctly
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085030.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Aloke Ghoshal <al...@gmail.com>.

Hi,

That's correct the Analyzers will get applied to both Index & Query time.
In fact I do get results back for speedPost with this field definition.

Regards,
Aloke

On Fri, Aug 16, 2013 at 5:21 PM, vicky desai <vi...@germinait.com>wrote:

> Hi,
>
> Another Example I found is q=Content:wi-fi doesn't match for documents with
> word wifi. I think it is not catenating the query keywords correctly
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085030.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi,

Another Example I found is q=Content:wi-fi doesn't match for documents with
word wifi. I think it is not catenating the query keywords correctly



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085030.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by vicky desai <vi...@germinait.com>.

Hi Aloke,

I am using the same analyzer for indexing as well as quering so
LowerCaseFilterFactory should work for both, right?



--
View this message in context: http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021p4085025.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory

Posted by Aloke Ghoshal <al...@gmail.com>.

Hi,

Based on your WhitespaceTokenizerFactory & due to the
LowerCaseFilterFactory the words actually indexed are:
speed, post, speedpost

You should get results for: q:Content:speedpost

So either remove the LowerCaseFilterFactory or add the
LowerCaseFilterFactory to as a query time Analyzer as well.

Regards,
Aloke




On Fri, Aug 16, 2013 at 4:53 PM, vicky desai <vi...@germinait.com>wrote:

> Hi All,
>
> I have a query regarding the use of wordDelimiterFilterFactory.  My schema
> definition for the text field is as follows
>
>         <fieldType name="text" class="solr.TextField"
>                         positionIncrementGap="100">
>                         <analyzer>
>                                 <tokenizer
> class="solr.WhitespaceTokenizerFactory" />
>                                 <filter
> class="solr.WordDelimiterFilterFactory"
>                                         splitOnCaseChange="1"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
>                                         catenateNumbers="1"
> catenateAll="1"  preserveOriginal="1"/>
>                                 <filter
> class="solr.LowerCaseFilterFactory" />
>                         </analyzer>
>                 </fieldType>
>
> <field name="Content" type="text" indexed="true" stored="true"
> multiValued="false"/>
>
> If I make the following query q=Content:speedPost
>
> then docs having Content *speed post *are matched which is as expected but
> docs having Content *speedpost* do not match.
>
> Can anybody please highlight if I am going incorrect somewhere
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-tp4085021.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>