You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jean-Claude Dauphin <jc...@gmail.com> on 2013/12/09 22:15:38 UTC

Why PhraseQuery translate stopwords to "?"

Hi,

My application uses an analyzer with a StopWordFilter. PhraseQuery
translates queries with stopwords by replacing stopwords to "?" characters.
For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
contribute" is replaced by "? contribute" . Sequence of terms are indexed
without stopwords. Query Searching works when the stopword starts the
phrase but no results when the "?"  is not at the beginning.

Searching for phrases without stopwords works well.

Any explanation/FAQ/user-list-message that explains why PhraseQuery
translate stopwords to "?" would be appreciated.

Thank you in advance

Jean-Claude Dauphin

-- 
Jean-Claude Dauphin

jc.dauphin@gmail.com
jc.dauphin@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org

Re: Why PhraseQuery translate stopwords to "?"

Posted by Jack Krupansky <ja...@basetechnology.com>.
In theory, the query with holes (position increments) for stop words should 
work... unless you originally indexed the data without the stop word filter. 
Any time you change the filters, you typically need to reindex the data.

-- Jack Krupansky

-----Original Message----- 
From: Jean-Claude Dauphin
Sent: Tuesday, December 10, 2013 4:21 AM
To: java-user@lucene.apache.org
Subject: Re: Why PhraseQuery translate stopwords to "?"

Thanks a lot Jack for this explanation!

I changed the custom query analyzer to avoid incrementing the position of
the subsequent term for each stop word  as follow:
        // stop words removal
        StopFilter stopFilter = new StopFilter(Lucene.MATCH_VERSION,
                result,
                stopSet);
        // Needed to get rid of Question mark placeholders for stopwords
        stopFilter.setEnablePositionIncrements(false);

And now the translation of stopwords to "?" is not done and it works
as Iexpected, i.e:
"Biology of fresh, brackish and saline water as it contributes to tropical
delta formation" is translated to:
Title:"BIOLOGY FRESH BRACKISH SALINE WATER CONTRIBUTES TROPICAL DELTA
FORMATION"

The problem with the stopword placeholder "?" query is that the search will
not find any results while the query without "?" gives the correct results

Thanks again, I was struggling with this issue the last 2 days.

Jean-Claude Dauphin




On Mon, Dec 9, 2013 at 11:02 PM, Jack Krupansky 
<ja...@basetechnology.com>wrote:

> The analyzer is generating holes for the stop words - the position of the
> subsequent term is incremented an extra time for each stop word so that
> their positions are maintained.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jean-Claude Dauphin
> Sent: Monday, December 09, 2013 4:15 PM
> To: java-user@lucene.apache.org
> Subject: Why PhraseQuery translate stopwords to "?"
>
>
> Hi,
>
> My application uses an analyzer with a StopWordFilter. PhraseQuery
> translates queries with stopwords by replacing stopwords to "?" 
> characters.
> For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
> contribute" is replaced by "? contribute" . Sequence of terms are indexed
> without stopwords. Query Searching works when the stopword starts the
> phrase but no results when the "?"  is not at the beginning.
>
> Searching for phrases without stopwords works well.
>
> Any explanation/FAQ/user-list-message that explains why PhraseQuery
> translate stopwords to "?" would be appreciated.
>
> Thank you in advance
>
> Jean-Claude Dauphin
>
> --
> Jean-Claude Dauphin
>
> jc.dauphin@gmail.com
> jc.dauphin@afus.unesco.org
>
> http://kenai.com/projects/j-isis/
> http://www.unesco.org/isis/
> http://www.unesco.org/idams/
> http://www.greenstone.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Why PhraseQuery translate stopwords to "?"

Posted by Jean-Claude Dauphin <jc...@gmail.com>.
Thanks a lot Jack for this explanation!

I changed the custom query analyzer to avoid incrementing the position of
the subsequent term for each stop word  as follow:
        // stop words removal
        StopFilter stopFilter = new StopFilter(Lucene.MATCH_VERSION,
                result,
                stopSet);
        // Needed to get rid of Question mark placeholders for stopwords
        stopFilter.setEnablePositionIncrements(false);

 And now the translation of stopwords to "?" is not done and it works
as Iexpected, i.e:
"Biology of fresh, brackish and saline water as it contributes to tropical
delta formation" is translated to:
Title:"BIOLOGY FRESH BRACKISH SALINE WATER CONTRIBUTES TROPICAL DELTA
FORMATION"

The problem with the stopword placeholder "?" query is that the search will
not find any results while the query without "?" gives the correct results

Thanks again, I was struggling with this issue the last 2 days.

Jean-Claude Dauphin




On Mon, Dec 9, 2013 at 11:02 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> The analyzer is generating holes for the stop words - the position of the
> subsequent term is incremented an extra time for each stop word so that
> their positions are maintained.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jean-Claude Dauphin
> Sent: Monday, December 09, 2013 4:15 PM
> To: java-user@lucene.apache.org
> Subject: Why PhraseQuery translate stopwords to "?"
>
>
> Hi,
>
> My application uses an analyzer with a StopWordFilter. PhraseQuery
> translates queries with stopwords by replacing stopwords to "?" characters.
> For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
> contribute" is replaced by "? contribute" . Sequence of terms are indexed
> without stopwords. Query Searching works when the stopword starts the
> phrase but no results when the "?"  is not at the beginning.
>
> Searching for phrases without stopwords works well.
>
> Any explanation/FAQ/user-list-message that explains why PhraseQuery
> translate stopwords to "?" would be appreciated.
>
> Thank you in advance
>
> Jean-Claude Dauphin
>
> --
> Jean-Claude Dauphin
>
> jc.dauphin@gmail.com
> jc.dauphin@afus.unesco.org
>
> http://kenai.com/projects/j-isis/
> http://www.unesco.org/isis/
> http://www.unesco.org/idams/
> http://www.greenstone.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Why PhraseQuery translate stopwords to "?"

Posted by Jack Krupansky <ja...@basetechnology.com>.
The analyzer is generating holes for the stop words - the position of the 
subsequent term is incremented an extra time for each stop word so that 
their positions are maintained.

-- Jack Krupansky

-----Original Message----- 
From: Jean-Claude Dauphin
Sent: Monday, December 09, 2013 4:15 PM
To: java-user@lucene.apache.org
Subject: Why PhraseQuery translate stopwords to "?"

Hi,

My application uses an analyzer with a StopWordFilter. PhraseQuery
translates queries with stopwords by replacing stopwords to "?" characters.
For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
contribute" is replaced by "? contribute" . Sequence of terms are indexed
without stopwords. Query Searching works when the stopword starts the
phrase but no results when the "?"  is not at the beginning.

Searching for phrases without stopwords works well.

Any explanation/FAQ/user-list-message that explains why PhraseQuery
translate stopwords to "?" would be appreciated.

Thank you in advance

Jean-Claude Dauphin

-- 
Jean-Claude Dauphin

jc.dauphin@gmail.com
jc.dauphin@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org