You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Elmer van Chastelet <ev...@gmail.com> on 2012/04/23 21:27:39 UTC

PhoneticFilterFactory 's inject parameter

Hi all,

(scroll to bottom for question)

I was setting up a simple web app to play around with phonetic filters.
The idea is simple, I just create a document for each word in the 
English dictionary, each document containing a single search field 
holding the value after it is preprocessed using the following analyzer 
def (in our own dsl syntax, which gets transformed to java):

analyzer soundslike{
     tokenizer = KeywordTokenizer
     tokenfilter = LowerCaseFilter
     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
}

I can run the web app and I get results that indeed (in some way) sound 
like the original query term.

But what confuses me is the ranking of the results, knowing that I set 
the inject param to true. If I search for the query term 'compete', the 
parsed query becomes '(value:KMPT value:compete)', and therefore I 
expect the word 'compete' to be ranked highest in the list than any 
other word.... but this wasn't the case.

Looking further at the explanation of results, I saw that the term 
'compete' in the parsed query is totally absent, and only the phonetic 
encoding seems affect the ranking:

  * COMPETITOR
      o 4.368826 = (MATCH) sum of:
          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
              # 0.52838135 = queryWeight(value:KMPT), product of:
                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
                  * 0.063904315 = queryNorm
              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
                product of:
                  * 1.0 = tf(termFreq(value:KMPT)=1)
                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
                  * 1.0 = fieldNorm(field=value, doc=3174)

The next thing I did was running our friend Luke. In Luke, I opened the 
documents tab, and started iterating over some terms for the field 
'value' until I found 'compete'. When I hit 'Show All Docs', the search 
tab opens and it displays the one and only document holding this value 
(i.e. the document representing the word 'compete'). It shows the query: 
'value:compete '. Then, when I hit the search button again (query is 
still 'value:compete '), it says that there are no results !?

Probably, the 'Show All Docs' button does something different than 
performing a query using the search tab in Luke.

Q: Can somebody explain why the injected original terms seem to get 
ignored at query time? Or may it be related to the name of the search 
field ('value'), or something else?

We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).

-Elmer



Re: PhoneticFilterFactory 's inject parameter

Posted by Ian Lea <ia...@gmail.com>.
There are useful tips in the FAQ,
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F.

I still think you should come up with small self-contained example code.


--
Ian.


On Wed, Apr 25, 2012 at 4:02 PM, Elmer van Chastelet
<ev...@gmail.com> wrote:
> Thanks for your suggestion Ian, but I just found out that if I replace the
> KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine.
>
> Just to test what happens, I created another field 'orig', using this
> analyzer:
> analyzer KeywordLowered{
>    tokenizer = KeywordTokenizer
>    tokenfilter = LowerCaseFilter
> }
>
> Guess what.. exactly the same problem, also in Luke.
> It finds no documents with for query:
> orig:strange
> While the term 'strange' is in the index for the field 'orig'.
>
> Does anybody have a clue why documents are not matched when using the
> KeywordTokenizer? Remember that all queries and terms don't contain white
> spaces.
>
>
> Thanks again.
> -Elmer
>
>
> On 04/25/2012 02:53 PM, Ian Lea wrote:
>>
>> You seem to be quietly going round in circles, by yourself!  I suggest
>> a small self-contained program/test case with a RAM index created from
>> scratch.  You can then experiment with inject on or off and if you
>> still can't figure it out, post the code and hopefully someone will be
>> able to help you make sense of it.
>>
>> Make sure you tell us what version of Lucene you are using.  If not
>> the latest, wouldn't hurt to try with the latest.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
>> <ev...@gmail.com>  wrote:
>>>
>>> I keep replying to myself, it all gets a bit confusing.
>>> The problem still exists and I don't understand why, and why it worked
>>> once.
>>>
>>> I have the same behavior again as posted in my first mail:
>>> - Inject parameter is set to true.
>>> - The index has _no deleted documents_ and is optimized.
>>> - The term 'compete' is in there.
>>> - If I ask Luke to show all docs for term 'compete' it shows me the one
>>> and
>>> only document that represents this word. But...
>>> - If I perform the query 'value:compete' in luke again, it says there are
>>> no
>>> results.
>>>
>>> Here is the index I'm currently using. It contains various fields for the
>>> available phonetic filter encoders:
>>> https://www.box.com/s/34212e82227e102f6734
>>>
>>> Can somebody explain this behavior? What's the real use of the inject
>>> parameter of the PhoneticFilterFactory?
>>>
>>> Thanks in advance.
>>>
>>> -Elmer
>>>
>>>
>>> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>>>
>>>> Problem solved. Long story short: for some reason I had deleted
>>>> documents
>>>> in the index and the non-deleted documents used the phonetic filter with
>>>> inject set to false.
>>>>
>>>> Works fine now :)
>>>>
>>>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> (scroll to bottom for question)
>>>>>
>>>>> I was setting up a simple web app to play around with phonetic filters.
>>>>> The idea is simple, I just create a document for each word in the
>>>>> English
>>>>> dictionary, each document containing a single search field holding the
>>>>> value
>>>>> after it is preprocessed using the following analyzer def (in our own
>>>>> dsl
>>>>> syntax, which gets transformed to java):
>>>>>
>>>>> analyzer soundslike{
>>>>>    tokenizer = KeywordTokenizer
>>>>>    tokenfilter = LowerCaseFilter
>>>>>    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone",
>>>>> inject="true")
>>>>> }
>>>>>
>>>>> I can run the web app and I get results that indeed (in some way) sound
>>>>> like the original query term.
>>>>>
>>>>> But what confuses me is the ranking of the results, knowing that I set
>>>>> the inject param to true. If I search for the query term 'compete', the
>>>>> parsed query becomes '(value:KMPT value:compete)', and therefore I
>>>>> expect
>>>>> the word 'compete' to be ranked highest in the list than any other
>>>>> word....
>>>>> but this wasn't the case.
>>>>>
>>>>> Looking further at the explanation of results, I saw that the term
>>>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>>>> encoding seems affect the ranking:
>>>>>
>>>>>  * COMPETITOR
>>>>>      o 4.368826 = (MATCH) sum of:
>>>>>          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>>>              # 0.52838135 = queryWeight(value:KMPT), product of:
>>>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>>                  * 0.063904315 = queryNorm
>>>>>              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>>>                product of:
>>>>>                  * 1.0 = tf(termFreq(value:KMPT)=1)
>>>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>>                  * 1.0 = fieldNorm(field=value, doc=3174)
>>>>>
>>>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>>>> documents tab, and started iterating over some terms for the field
>>>>> 'value'
>>>>> until I found 'compete'. When I hit 'Show All Docs', the search tab
>>>>> opens
>>>>> and it displays the one and only document holding this value (i.e. the
>>>>> document representing the word 'compete'). It shows the query:
>>>>> 'value:compete '. Then, when I hit the search button again (query is
>>>>> still
>>>>> 'value:compete '), it says that there are no results !?
>>>>>
>>>>> Probably, the 'Show All Docs' button does something different than
>>>>> performing a query using the search tab in Luke.
>>>>>
>>>>> Q: Can somebody explain why the injected original terms seem to get
>>>>> ignored at query time? Or may it be related to the name of the search
>>>>> field
>>>>> ('value'), or something else?
>>>>>
>>>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>>>
>>>>> -Elmer
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: PhoneticFilterFactory 's inject parameter

Posted by Elmer van Chastelet <ev...@gmail.com>.
Thanks for your suggestion Ian, but I just found out that if I replace 
the KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine.

Just to test what happens, I created another field 'orig', using this 
analyzer:
analyzer KeywordLowered{
     tokenizer = KeywordTokenizer
     tokenfilter = LowerCaseFilter
}

Guess what.. exactly the same problem, also in Luke.
It finds no documents with for query:
orig:strange
While the term 'strange' is in the index for the field 'orig'.

Does anybody have a clue why documents are not matched when using the 
KeywordTokenizer? Remember that all queries and terms don't contain 
white spaces.


Thanks again.
-Elmer


On 04/25/2012 02:53 PM, Ian Lea wrote:
> You seem to be quietly going round in circles, by yourself!  I suggest
> a small self-contained program/test case with a RAM index created from
> scratch.  You can then experiment with inject on or off and if you
> still can't figure it out, post the code and hopefully someone will be
> able to help you make sense of it.
>
> Make sure you tell us what version of Lucene you are using.  If not
> the latest, wouldn't hurt to try with the latest.
>
>
> --
> Ian.
>
>
> On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
> <ev...@gmail.com>  wrote:
>> I keep replying to myself, it all gets a bit confusing.
>> The problem still exists and I don't understand why, and why it worked once.
>>
>> I have the same behavior again as posted in my first mail:
>> - Inject parameter is set to true.
>> - The index has _no deleted documents_ and is optimized.
>> - The term 'compete' is in there.
>> - If I ask Luke to show all docs for term 'compete' it shows me the one and
>> only document that represents this word. But...
>> - If I perform the query 'value:compete' in luke again, it says there are no
>> results.
>>
>> Here is the index I'm currently using. It contains various fields for the
>> available phonetic filter encoders:
>> https://www.box.com/s/34212e82227e102f6734
>>
>> Can somebody explain this behavior? What's the real use of the inject
>> parameter of the PhoneticFilterFactory?
>>
>> Thanks in advance.
>>
>> -Elmer
>>
>>
>> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>> Problem solved. Long story short: for some reason I had deleted documents
>>> in the index and the non-deleted documents used the phonetic filter with
>>> inject set to false.
>>>
>>> Works fine now :)
>>>
>>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>> Hi all,
>>>>
>>>> (scroll to bottom for question)
>>>>
>>>> I was setting up a simple web app to play around with phonetic filters.
>>>> The idea is simple, I just create a document for each word in the English
>>>> dictionary, each document containing a single search field holding the value
>>>> after it is preprocessed using the following analyzer def (in our own dsl
>>>> syntax, which gets transformed to java):
>>>>
>>>> analyzer soundslike{
>>>>     tokenizer = KeywordTokenizer
>>>>     tokenfilter = LowerCaseFilter
>>>>     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
>>>> }
>>>>
>>>> I can run the web app and I get results that indeed (in some way) sound
>>>> like the original query term.
>>>>
>>>> But what confuses me is the ranking of the results, knowing that I set
>>>> the inject param to true. If I search for the query term 'compete', the
>>>> parsed query becomes '(value:KMPT value:compete)', and therefore I expect
>>>> the word 'compete' to be ranked highest in the list than any other word....
>>>> but this wasn't the case.
>>>>
>>>> Looking further at the explanation of results, I saw that the term
>>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>>> encoding seems affect the ranking:
>>>>
>>>>   * COMPETITOR
>>>>       o 4.368826 = (MATCH) sum of:
>>>>           + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>>               # 0.52838135 = queryWeight(value:KMPT), product of:
>>>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>                   * 0.063904315 = queryNorm
>>>>               # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>>                 product of:
>>>>                   * 1.0 = tf(termFreq(value:KMPT)=1)
>>>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>                   * 1.0 = fieldNorm(field=value, doc=3174)
>>>>
>>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>>> documents tab, and started iterating over some terms for the field 'value'
>>>> until I found 'compete'. When I hit 'Show All Docs', the search tab opens
>>>> and it displays the one and only document holding this value (i.e. the
>>>> document representing the word 'compete'). It shows the query:
>>>> 'value:compete '. Then, when I hit the search button again (query is still
>>>> 'value:compete '), it says that there are no results !?
>>>>
>>>> Probably, the 'Show All Docs' button does something different than
>>>> performing a query using the search tab in Luke.
>>>>
>>>> Q: Can somebody explain why the injected original terms seem to get
>>>> ignored at query time? Or may it be related to the name of the search field
>>>> ('value'), or something else?
>>>>
>>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>>
>>>> -Elmer
>>>>
>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: PhoneticFilterFactory 's inject parameter

Posted by Ian Lea <ia...@gmail.com>.
You seem to be quietly going round in circles, by yourself!  I suggest
a small self-contained program/test case with a RAM index created from
scratch.  You can then experiment with inject on or off and if you
still can't figure it out, post the code and hopefully someone will be
able to help you make sense of it.

Make sure you tell us what version of Lucene you are using.  If not
the latest, wouldn't hurt to try with the latest.


--
Ian.


On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
<ev...@gmail.com> wrote:
> I keep replying to myself, it all gets a bit confusing.
> The problem still exists and I don't understand why, and why it worked once.
>
> I have the same behavior again as posted in my first mail:
> - Inject parameter is set to true.
> - The index has _no deleted documents_ and is optimized.
> - The term 'compete' is in there.
> - If I ask Luke to show all docs for term 'compete' it shows me the one and
> only document that represents this word. But...
> - If I perform the query 'value:compete' in luke again, it says there are no
> results.
>
> Here is the index I'm currently using. It contains various fields for the
> available phonetic filter encoders:
> https://www.box.com/s/34212e82227e102f6734
>
> Can somebody explain this behavior? What's the real use of the inject
> parameter of the PhoneticFilterFactory?
>
> Thanks in advance.
>
> -Elmer
>
>
> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>
>> Problem solved. Long story short: for some reason I had deleted documents
>> in the index and the non-deleted documents used the phonetic filter with
>> inject set to false.
>>
>> Works fine now :)
>>
>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>
>>> Hi all,
>>>
>>> (scroll to bottom for question)
>>>
>>> I was setting up a simple web app to play around with phonetic filters.
>>> The idea is simple, I just create a document for each word in the English
>>> dictionary, each document containing a single search field holding the value
>>> after it is preprocessed using the following analyzer def (in our own dsl
>>> syntax, which gets transformed to java):
>>>
>>> analyzer soundslike{
>>>    tokenizer = KeywordTokenizer
>>>    tokenfilter = LowerCaseFilter
>>>    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
>>> }
>>>
>>> I can run the web app and I get results that indeed (in some way) sound
>>> like the original query term.
>>>
>>> But what confuses me is the ranking of the results, knowing that I set
>>> the inject param to true. If I search for the query term 'compete', the
>>> parsed query becomes '(value:KMPT value:compete)', and therefore I expect
>>> the word 'compete' to be ranked highest in the list than any other word....
>>> but this wasn't the case.
>>>
>>> Looking further at the explanation of results, I saw that the term
>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>> encoding seems affect the ranking:
>>>
>>>  * COMPETITOR
>>>      o 4.368826 = (MATCH) sum of:
>>>          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>              # 0.52838135 = queryWeight(value:KMPT), product of:
>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>                  * 0.063904315 = queryNorm
>>>              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>                product of:
>>>                  * 1.0 = tf(termFreq(value:KMPT)=1)
>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>                  * 1.0 = fieldNorm(field=value, doc=3174)
>>>
>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>> documents tab, and started iterating over some terms for the field 'value'
>>> until I found 'compete'. When I hit 'Show All Docs', the search tab opens
>>> and it displays the one and only document holding this value (i.e. the
>>> document representing the word 'compete'). It shows the query:
>>> 'value:compete '. Then, when I hit the search button again (query is still
>>> 'value:compete '), it says that there are no results !?
>>>
>>> Probably, the 'Show All Docs' button does something different than
>>> performing a query using the search tab in Luke.
>>>
>>> Q: Can somebody explain why the injected original terms seem to get
>>> ignored at query time? Or may it be related to the name of the search field
>>> ('value'), or something else?
>>>
>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>
>>> -Elmer
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: PhoneticFilterFactory 's inject parameter

Posted by Elmer van Chastelet <ev...@gmail.com>.
I keep replying to myself, it all gets a bit confusing.
The problem still exists and I don't understand why, and why it worked once.

I have the same behavior again as posted in my first mail:
- Inject parameter is set to true.
- The index has _no deleted documents_ and is optimized.
- The term 'compete' is in there.
- If I ask Luke to show all docs for term 'compete' it shows me the one 
and only document that represents this word. But...
- If I perform the query 'value:compete' in luke again, it says there 
are no results.

Here is the index I'm currently using. It contains various fields for 
the available phonetic filter encoders:
https://www.box.com/s/34212e82227e102f6734

Can somebody explain this behavior? What's the real use of the inject 
parameter of the PhoneticFilterFactory?

Thanks in advance.

-Elmer


On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
> Problem solved. Long story short: for some reason I had deleted 
> documents in the index and the non-deleted documents used the phonetic 
> filter with inject set to false.
>
> Works fine now :)
>
> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>> Hi all,
>>
>> (scroll to bottom for question)
>>
>> I was setting up a simple web app to play around with phonetic filters.
>> The idea is simple, I just create a document for each word in the 
>> English dictionary, each document containing a single search field 
>> holding the value after it is preprocessed using the following 
>> analyzer def (in our own dsl syntax, which gets transformed to java):
>>
>> analyzer soundslike{
>>     tokenizer = KeywordTokenizer
>>     tokenfilter = LowerCaseFilter
>>     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", 
>> inject="true")
>> }
>>
>> I can run the web app and I get results that indeed (in some way) 
>> sound like the original query term.
>>
>> But what confuses me is the ranking of the results, knowing that I 
>> set the inject param to true. If I search for the query term 
>> 'compete', the parsed query becomes '(value:KMPT value:compete)', and 
>> therefore I expect the word 'compete' to be ranked highest in the 
>> list than any other word.... but this wasn't the case.
>>
>> Looking further at the explanation of results, I saw that the term 
>> 'compete' in the parsed query is totally absent, and only the 
>> phonetic encoding seems affect the ranking:
>>
>>   * COMPETITOR
>>       o 4.368826 = (MATCH) sum of:
>>           + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>               # 0.52838135 = queryWeight(value:KMPT), product of:
>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>                   * 0.063904315 = queryNorm
>>               # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>                 product of:
>>                   * 1.0 = tf(termFreq(value:KMPT)=1)
>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>                   * 1.0 = fieldNorm(field=value, doc=3174)
>>
>> The next thing I did was running our friend Luke. In Luke, I opened 
>> the documents tab, and started iterating over some terms for the 
>> field 'value' until I found 'compete'. When I hit 'Show All Docs', 
>> the search tab opens and it displays the one and only document 
>> holding this value (i.e. the document representing the word 
>> 'compete'). It shows the query: 'value:compete '. Then, when I hit 
>> the search button again (query is still 'value:compete '), it says 
>> that there are no results !?
>>
>> Probably, the 'Show All Docs' button does something different than 
>> performing a query using the search tab in Luke.
>>
>> Q: Can somebody explain why the injected original terms seem to get 
>> ignored at query time? Or may it be related to the name of the search 
>> field ('value'), or something else?
>>
>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>
>> -Elmer
>>
>>
>


Re: PhoneticFilterFactory 's inject parameter

Posted by Elmer van Chastelet <ev...@gmail.com>.
Problem solved. Long story short: for some reason I had deleted 
documents in the index and the non-deleted documents used the phonetic 
filter with inject set to false.

Works fine now :)

On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
> Hi all,
>
> (scroll to bottom for question)
>
> I was setting up a simple web app to play around with phonetic filters.
> The idea is simple, I just create a document for each word in the 
> English dictionary, each document containing a single search field 
> holding the value after it is preprocessed using the following 
> analyzer def (in our own dsl syntax, which gets transformed to java):
>
> analyzer soundslike{
>     tokenizer = KeywordTokenizer
>     tokenfilter = LowerCaseFilter
>     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
> }
>
> I can run the web app and I get results that indeed (in some way) 
> sound like the original query term.
>
> But what confuses me is the ranking of the results, knowing that I set 
> the inject param to true. If I search for the query term 'compete', 
> the parsed query becomes '(value:KMPT value:compete)', and therefore I 
> expect the word 'compete' to be ranked highest in the list than any 
> other word.... but this wasn't the case.
>
> Looking further at the explanation of results, I saw that the term 
> 'compete' in the parsed query is totally absent, and only the phonetic 
> encoding seems affect the ranking:
>
>   * COMPETITOR
>       o 4.368826 = (MATCH) sum of:
>           + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>               # 0.52838135 = queryWeight(value:KMPT), product of:
>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>                   * 0.063904315 = queryNorm
>               # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>                 product of:
>                   * 1.0 = tf(termFreq(value:KMPT)=1)
>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>                   * 1.0 = fieldNorm(field=value, doc=3174)
>
> The next thing I did was running our friend Luke. In Luke, I opened 
> the documents tab, and started iterating over some terms for the field 
> 'value' until I found 'compete'. When I hit 'Show All Docs', the 
> search tab opens and it displays the one and only document holding 
> this value (i.e. the document representing the word 'compete'). It 
> shows the query: 'value:compete '. Then, when I hit the search button 
> again (query is still 'value:compete '), it says that there are no 
> results !?
>
> Probably, the 'Show All Docs' button does something different than 
> performing a query using the search tab in Luke.
>
> Q: Can somebody explain why the injected original terms seem to get 
> ignored at query time? Or may it be related to the name of the search 
> field ('value'), or something else?
>
> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>
> -Elmer
>
>


Re: PhoneticFilterFactory 's inject parameter

Posted by Elmer van Chastelet <ev...@gmail.com>.
Little correction:

> Looking further at the explanation of results, I saw that the term 
> 'compete' in the parsed query is totally absent, and only the phonetic 
> encoding seems affect the ranking...

should be:
> Looking further at the explanation of results, I saw that _the term 
> 'compete' is totally absent _/_in the scoring*_/, and only the 
> phonetic encoding seems affect the ranking...

* and /present/ in the parsed query as previously stated.

-Elmer