You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Sven Duzont <sv...@keljob.com> on 2005/04/01 18:07:03 UTC

Re[4]: Analyzer don't work with wildcard queries, snowball analyzer.

EH> I presume your analyzer normalized accented characters?  Which analyzer
EH> is that?

Yes, i'm using a custom analyser for indexing / searching, ti consists
in :
- FrenchStopFilter
- IsoLatinFilter (this is the one that will replace accented
characters)
- LowerCaseFilter
- ApostropheFilter (in order to handle terms like with apostrophes,
for instance "l'expérience" will be decompozed into two tokens : "l" "expérience"

EH> You will need to employ some form of character normalization on 
EH> wildcard queries too.

thanks, it works succeffuly, code snippet following

---
 sven

/*----------------------- CODE ----------------------------*/

private static Query CreateCustomQuery(Query query)
{
  if(query instanceof BooleanQuery)  {
    final BooleanClause[] bClauses = ((BooleanQuery) query).getClauses();
    
    // The first clause is required
    if(bClauses[0].prohibited != true)
      bClauses[0].required = true;
      
    // Will parse each clause to remove accents if needed
    Term term;
    for (int i = 0; i < bClauses.length; i++)    {
      if(bClauses[i].query instanceof WildcardQuery)      {
        term = ((WildcardQuery)bClauses[i].query).getTerm();
        bClauses[i].query = new WildcardQuery(new Term(term.field(), 
            ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
      }
      if(bClauses[i].query instanceof PrefixQuery)      {
        term = ((PrefixQuery)bClauses[i].query).getPrefix();
        bClauses[i].query = new PrefixQuery(new Term(term.field(), 
            ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
      // toLowerCase because the text is lowercased during indexation
      }
    }    
  }
  else if(query instanceof WildcardQuery)  {
    final Term term = ((WildcardQuery)query).getTerm();
    query = new WildcardQuery(new Term(term.field(), 
        ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
  }
  else if(query instanceof PrefixQuery)  {
    final Term term = ((PrefixQuery)query).getPrefix();
    query = new PrefixQuery(new Term(term.field(), 
        ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
  }
  return query;
}

/*----------------------- END OF CODE ----------------------------*/

EH> 	Erik




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re[8]: Analyzer don't work with wildcard queries, snowball analyzer.

Posted by Sven Duzont <sv...@laposte.net>.

EH> Thanks for sharing that!
EH> Would you be interested in donating that to the contrib area for
EH> analyzers?  The topic of normalizing accented characters has come up
EH> often lately.   I noticed you already put the Apache license at the top
EH> of the code.

No problem, it was intended for the sandbox.

EH> When using QueryParser, you can set the default operator, which is
EH> normally OR.  It will handle setting the first (and every) clause 
EH> appropriately.  You'll need to instantiate an instance of QueryParser
EH> to set that flag (see javadocs for details).

Yes, that what i was first thinking of, but they (the end users) wanted
all clauses except the first to be handled by the 'OR' operator.
I'll try to convince them that it will make my (and their) life easier
if the default operator for all clauses is 'AND' ;)

Regards,

   Sven



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Re[6]: Analyzer don't work with wildcard queries, snowball analyzer.

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Apr 2, 2005, at 7:01 AM, Sven Duzont wrote:
> EH> Could you share that filter with the community?
> Of course, the code is in the attachment

Thanks for sharing that!

Would you be interested in donating that to the contrib area for 
analyzers?  The topic of normalizing accented characters has come up 
often lately.   I noticed you already put the Apache license at the top 
of the code.

>>>     // The first clause is required
>>>     if(bClauses[0].prohibited != true)
>>>       bClauses[0].required = true;
> EH> Why do you flip the required flag like this?
> On the search interface, near the keyword field, there is a combo
> with 4 values :
> - KW_MODE_OR      : "Search for at least one of the terms"
> - KW_MODE_AND     : "Search for all the terms"
> - KW_MODE_PHRASE  : "Search for exact phrase"
> - KW_MODE_BOOLEAN : "Search using boolean query" (for advanced users)
>   I flip the request field only when boolean expression is selected
>   It force the first term to be required so the user will not
>   need to specify the "+" or "AND" operator
>   Maybe there is a more elegant way to do this ?

When using QueryParser, you can set the default operator, which is 
normally OR.  It will handle setting the first (and every) clause 
appropriately.  You'll need to instantiate an instance of QueryParser 
to set that flag (see javadocs for details).

	Erik


>   // Expression booléenne
>   if (cvSearchBean.keywordModeId == KW_MODE_BOOLEAN) {
>     final Query query = QueryParser.parse(cvSearchBean.title,
>                                         FIELD_RESUME_BODY, analyzer);
>     if (query instanceof BooleanQuery) {
>       final BooleanClause[] bClauses =
>                               ((BooleanQuery) query).getClauses();
>       if (bClauses[0].prohibited != true)
>         bClauses[0].required = true;
>     }
>     bQuery.add(CreateCustomQuery(query), true, false);
>   }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re[6]: Analyzer don't work with wildcard queries, snowball analyzer.

Posted by Sven Duzont <sv...@laposte.net>.

Hello,

EH> What about handling BooleanQuery's nested within a BooleanQuery?
EH> You'll need some recursion.
thanks for all hints, i've re-coded the method to handle nested
BooleanQueries

EH> Could you share that filter with the community?
Of course, the code is in the attachment

>>     // The first clause is required
>>     if(bClauses[0].prohibited != true)
>>       bClauses[0].required = true;
EH> Why do you flip the required flag like this?
On the search interface, near the keyword field, there is a combo
with 4 values :
- KW_MODE_OR      : "Search for at least one of the terms"
- KW_MODE_AND     : "Search for all the terms"
- KW_MODE_PHRASE  : "Search for exact phrase"
- KW_MODE_BOOLEAN : "Search using boolean query" (for advanced users)
  I flip the request field only when boolean expression is selected
  It force the first term to be required so the user will not
  need to specify the "+" or "AND" operator
  Maybe there is a more elegant way to do this ?
  The code is following

  Thanks
---
 Sven (is not a bersek)

*/-------------------------------- CODE ---------------------------/*
// mots clés contenus dans le cv
if (cvSearchBean.keywords != null &&
    cvSearchBean.keywords.length() != 0) {
  // "Tous les Mot clés" ou "Au moins un des mots clés"
  boolean required = false;
  if ((required = cvSearchBean.keywordModeId == KW_MODE_AND) ||
       cvSearchBean.keywordModeId == KW_MODE_OR) {
    final Query q = CreateCustomQuery(QueryParser.parse(
           cvSearchBean.keywords, FIELD_RESUME_BODY, analyzer));
    if (q instanceof BooleanQuery) {
      final BooleanClause[] terms = ((BooleanQuery) q).getClauses();
      for (int i = 0; i < terms.length; i++) {
        terms[i].prohibited = false;
        terms[i].required = required;
      }
    }
    bQuery.add(q, true, false);
  }
  // Expression exacte
  if (cvSearchBean.keywordModeId == KW_MODE_PHRASE) {
    final PhraseQuery q = new PhraseQuery();
    final TokenStream ts = analyzer.tokenStream(FIELD_RESUME_BODY,
                          new StringReader(cvSearchBean.keywords));
    Token token;
    while ((token = ts.next()) != null)
      q.add(new Term(FIELD_RESUME_BODY, token.termText()));
    bQuery.add(q, true, false);
  }
  // Expression booléenne
  if (cvSearchBean.keywordModeId == KW_MODE_BOOLEAN) {
    final Query query = QueryParser.parse(cvSearchBean.title,
                                        FIELD_RESUME_BODY, analyzer);
    if (query instanceof BooleanQuery) {
      final BooleanClause[] bClauses =
                              ((BooleanQuery) query).getClauses();
      if (bClauses[0].prohibited != true)
        bClauses[0].required = true;
    }
    bQuery.add(CreateCustomQuery(query), true, false);
  }

*/--------------------------END OF CODE --------------------------/*
      
      

EH> 	Erik

Re: Re[4]: Analyzer don't work with wildcard queries, snowball analyzer.

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Apr 1, 2005, at 11:07 AM, Sven Duzont wrote:
> EH> I presume your analyzer normalized accented characters?  Which 
> analyzer
> EH> is that?
>
> Yes, i'm using a custom analyser for indexing / searching, ti consists
> in :
> - FrenchStopFilter
> - IsoLatinFilter (this is the one that will replace accented
> characters)

Could you share that filter with the community?

> EH> You will need to employ some form of character normalization on
> EH> wildcard queries too.
>
> thanks, it works succeffuly, code snippet following
>
> ---
>  sven
>
> /*----------------------- CODE ----------------------------*/
>
> private static Query CreateCustomQuery(Query query)
> {
>   if(query instanceof BooleanQuery)  {
>     final BooleanClause[] bClauses = ((BooleanQuery) 
> query).getClauses();
>
>     // The first clause is required
>     if(bClauses[0].prohibited != true)
>       bClauses[0].required = true;

Why do you flip the required flag like this?

>
>     // Will parse each clause to remove accents if needed
>     Term term;
>     for (int i = 0; i < bClauses.length; i++)    {
>       if(bClauses[i].query instanceof WildcardQuery)      {
>         term = ((WildcardQuery)bClauses[i].query).getTerm();
>         bClauses[i].query = new WildcardQuery(new Term(term.field(),
>             
> ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
>       }

What about handling BooleanQuery's nested within a BooleanQuery?  
You'll need some recursion.

	Erik



>       if(bClauses[i].query instanceof PrefixQuery)      {
>         term = ((PrefixQuery)bClauses[i].query).getPrefix();
>         bClauses[i].query = new PrefixQuery(new Term(term.field(),
>             
> ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
>       // toLowerCase because the text is lowercased during indexation
>       }
>     }
>   }
>   else if(query instanceof WildcardQuery)  {
>     final Term term = ((WildcardQuery)query).getTerm();
>     query = new WildcardQuery(new Term(term.field(),
>         
> ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
>   }
>   else if(query instanceof PrefixQuery)  {
>     final Term term = ((PrefixQuery)query).getPrefix();
>     query = new PrefixQuery(new Term(term.field(),
>         
> ISOLatin1AccentFilter.RemoveAccents(term.text().toLowerCase())));
>   }
>   return query;
> }
>
> /*----------------------- END OF CODE ----------------------------*/
>
> EH> 	Erik
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org