You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Karl Wettin <ka...@gmail.com> on 2007/08/27 16:01:18 UTC

Re: Searching Diacritics

27 aug 2007 kl. 16.03 skrev anorman:

>
> I have a searchable index of documents which contain french and  
> spanish
> diacritics (è, é, À) etc.  I would like to make the content  
> searchable so
> that when a user searches for a word such as "Amèrique" or "Amerique"
> (without diacritic) then it returns the same results.
>
> Has anyone set up something similar?

ISOLatin1AccentFilter

-- 
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Searching Diacritics

Posted by anorman <an...@mun.ca>.

This was the problem, it worked excellent!

Thanks for the help!

-Albert




karl wettin-3 wrote:
> 
> 
> 27 aug 2007 kl. 20.30 skrev anorman:
> 
>>
>> I've tried to implement an analyzer with little different then using:
>>
>> result = new ISOLatin1AccentFilter(result);  in the TokenStream  
>> method.
>>
>> Everything appears to work, however my search will not work for any  
>> word
>> with diacritics with that change.  Without using it will find words  
>> such as
>> "cèdulas" but not "cedulas", with the change it will find neither,  
>> it just
>> appears to be stripping it out altogether.
> 
> Do you use the same analyzer when searching as when creating the index?
> 
> -- 
> karl
> 
> 
>>
>> Any suggestions?
>>
>>
>>
>>
>> anorman wrote:
>>>
>>> Can I do this at search time rather than index time?  Below is my  
>>> code
>>> that is handling the searching, where would I utilize such a filter?
>>>
>>> Thanks for the help!
>>>
>>>
>>>
>>>
>>> package search.lucene.search;
>>> import org.apache.lucene.document.Document;
>>> import java.io.IOException;
>>> import java.util.ArrayList;
>>> import java.util.List;
>>>
>>> import org.apache.lucene.analysis.Analyzer;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.analysis.ISOLatin1AccentFilter;
>>> import org.apache.lucene.queryParser.ParseException;
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.search.Hits;
>>> import org.apache.lucene.search.IndexSearcher;
>>> import org.apache.lucene.search.Query;
>>>
>>> import search.lucene.index.IndexManager;
>>>
>>> /**
>>>  * This class is used to search the
>>>  * Lucene index and return search results
>>>  */
>>>
>>> public class SearchManager {
>>>
>>>
>>> private String searchWord;
>>>
>>>     private IndexManager indexManager;
>>>
>>>     private Analyzer analyzer;
>>>
>>>     public SearchManager(String searchWord){
>>>         this.searchWord   = searchWord;
>>>         this.indexManager = new IndexManager();
>>>         this.analyzer = new StandardAnalyzer();
>>>     }
>>>
>>>     /**
>>>      * do search
>>>      */
>>>     public List search(){
>>>         List searchResult = new ArrayList();
>>>             	
>>>         IndexSearcher indexSearcher = null;
>>>
>>>         try{
>>>             indexSearcher = new IndexSearcher 
>>> (indexManager.getIndexDir());
>>>         }catch(IOException ioe){
>>>             ioe.printStackTrace();
>>>         }
>>>
>>>         QueryParser queryParser = new QueryParser 
>>> ("content",analyzer);
>>>         Query query = null;
>>>         try {
>>>             query = queryParser.parse(searchWord);
>>>         } catch (ParseException e) {
>>>           e.printStackTrace();
>>>         }
>>>
>>>         if(null != query && null != indexSearcher){			
>>>             try {
>>>                 Hits hits = indexSearcher.search(query);
>>>                 for(int i = 0; i < hits.length(); i ++){
>>> 					
>>> 					Document doc = hits.doc(i);
>>>       				System.out.println(doc.get("filename"));
>>>
>>> 					SearchResultBean resultBean = new SearchResultBean();
>>>
>>>
>>> resultBean.setXMLId(hits.doc(i).get("id"));
>>> 					resultBean.setXMLTitle(hits.doc(i).get("title"));
>>> 					resultBean.setXMLAuthor(hits.doc(i).get("author"));
>>> 					resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
>>> 					resultBean.setScore(hits.score(i));
>>> 					
>>> 					searchResult.add(resultBean);
>>>                 }
>>>             } catch (IOException e) {
>>>                 e.printStackTrace();
>>>             }
>>>         }
>>>         return searchResult;
>>>
>>>     }
>>>
>>>
>>>
>>>
>>>
>>>
>>> thomas arni-2 wrote:
>>>>
>>>> You can extend the DefaultAnalyzer.
>>>> The only thing you have to do, is to rewrite the method  
>>>> tokenStream like
>>>> this:
>>>>
>>>>   /** Constructs a {@link StandardTokenizer} filtered by a {@link
>>>>   StandardFilter}, a {@link LowerCaseFilter} and a {@link  
>>>> StopFilter}. */
>>>>   public TokenStream tokenStream(String fieldName, Reader reader) {
>>>>     TokenStream result = new StandardTokenizer(reader);
>>>>     result = new StandardFilter(result);
>>>>     result = new LowerCaseFilter(result);
>>>>     result = new StopFilter(result, stopSet);
>>>>     result = new ISOLatin1AccentFilter(result);
>>>>     return result;
>>>>   }
>>>>
>>>>
>>>> anorman wrote:
>>>>> This looks like exactly what I want.  Would I implement this  
>>>>> along with
>>>>> another analyzer such as the standard or stand alone?  Does  
>>>>> anyone have
>>>>> any
>>>>> code examples of implementing such a thing?
>>>>>
>>>>> Thanks,
>>>>> Albert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> karl wettin-3 wrote:
>>>>>
>>>>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>>>>
>>>>>>
>>>>>>> I have a searchable index of documents which contain french and
>>>>>>> spanish
>>>>>>> diacritics (è, é, À) etc.  I would like to make the content
>>>>>>> searchable so
>>>>>>> that when a user searches for a word such as "Amèrique" or  
>>>>>>> "Amerique"
>>>>>>> (without diacritic) then it returns the same results.
>>>>>>>
>>>>>>> Has anyone set up something similar?
>>>>>>>
>>>>>> ISOLatin1AccentFilter
>>>>>>
>>>>>> -- 
>>>>>> karl
>>>>>> ------------------------------------------------------------------ 
>>>>>> ---
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Searching- 
>> Diacritics-tf4335454.html#a12354962
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12366572
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Searching Diacritics

Posted by Karl Wettin <ka...@gmail.com>.

27 aug 2007 kl. 20.30 skrev anorman:

>
> I've tried to implement an analyzer with little different then using:
>
> result = new ISOLatin1AccentFilter(result);  in the TokenStream  
> method.
>
> Everything appears to work, however my search will not work for any  
> word
> with diacritics with that change.  Without using it will find words  
> such as
> "cèdulas" but not "cedulas", with the change it will find neither,  
> it just
> appears to be stripping it out altogether.

Do you use the same analyzer when searching as when creating the index?

-- 
karl


>
> Any suggestions?
>
>
>
>
> anorman wrote:
>>
>> Can I do this at search time rather than index time?  Below is my  
>> code
>> that is handling the searching, where would I utilize such a filter?
>>
>> Thanks for the help!
>>
>>
>>
>>
>> package search.lucene.search;
>> import org.apache.lucene.document.Document;
>> import java.io.IOException;
>> import java.util.ArrayList;
>> import java.util.List;
>>
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.analysis.ISOLatin1AccentFilter;
>> import org.apache.lucene.queryParser.ParseException;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.search.Hits;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>>
>> import search.lucene.index.IndexManager;
>>
>> /**
>>  * This class is used to search the
>>  * Lucene index and return search results
>>  */
>>
>> public class SearchManager {
>>
>>
>> private String searchWord;
>>
>>     private IndexManager indexManager;
>>
>>     private Analyzer analyzer;
>>
>>     public SearchManager(String searchWord){
>>         this.searchWord   = searchWord;
>>         this.indexManager = new IndexManager();
>>         this.analyzer = new StandardAnalyzer();
>>     }
>>
>>     /**
>>      * do search
>>      */
>>     public List search(){
>>         List searchResult = new ArrayList();
>>             	
>>         IndexSearcher indexSearcher = null;
>>
>>         try{
>>             indexSearcher = new IndexSearcher 
>> (indexManager.getIndexDir());
>>         }catch(IOException ioe){
>>             ioe.printStackTrace();
>>         }
>>
>>         QueryParser queryParser = new QueryParser 
>> ("content",analyzer);
>>         Query query = null;
>>         try {
>>             query = queryParser.parse(searchWord);
>>         } catch (ParseException e) {
>>           e.printStackTrace();
>>         }
>>
>>         if(null != query && null != indexSearcher){			
>>             try {
>>                 Hits hits = indexSearcher.search(query);
>>                 for(int i = 0; i < hits.length(); i ++){
>> 					
>> 					Document doc = hits.doc(i);
>>       				System.out.println(doc.get("filename"));
>>
>> 					SearchResultBean resultBean = new SearchResultBean();
>>
>>
>> resultBean.setXMLId(hits.doc(i).get("id"));
>> 					resultBean.setXMLTitle(hits.doc(i).get("title"));
>> 					resultBean.setXMLAuthor(hits.doc(i).get("author"));
>> 					resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
>> 					resultBean.setScore(hits.score(i));
>> 					
>> 					searchResult.add(resultBean);
>>                 }
>>             } catch (IOException e) {
>>                 e.printStackTrace();
>>             }
>>         }
>>         return searchResult;
>>
>>     }
>>
>>
>>
>>
>>
>>
>> thomas arni-2 wrote:
>>>
>>> You can extend the DefaultAnalyzer.
>>> The only thing you have to do, is to rewrite the method  
>>> tokenStream like
>>> this:
>>>
>>>   /** Constructs a {@link StandardTokenizer} filtered by a {@link
>>>   StandardFilter}, a {@link LowerCaseFilter} and a {@link  
>>> StopFilter}. */
>>>   public TokenStream tokenStream(String fieldName, Reader reader) {
>>>     TokenStream result = new StandardTokenizer(reader);
>>>     result = new StandardFilter(result);
>>>     result = new LowerCaseFilter(result);
>>>     result = new StopFilter(result, stopSet);
>>>     result = new ISOLatin1AccentFilter(result);
>>>     return result;
>>>   }
>>>
>>>
>>> anorman wrote:
>>>> This looks like exactly what I want.  Would I implement this  
>>>> along with
>>>> another analyzer such as the standard or stand alone?  Does  
>>>> anyone have
>>>> any
>>>> code examples of implementing such a thing?
>>>>
>>>> Thanks,
>>>> Albert
>>>>
>>>>
>>>>
>>>>
>>>> karl wettin-3 wrote:
>>>>
>>>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>>>
>>>>>
>>>>>> I have a searchable index of documents which contain french and
>>>>>> spanish
>>>>>> diacritics (è, é, À) etc.  I would like to make the content
>>>>>> searchable so
>>>>>> that when a user searches for a word such as "Amèrique" or  
>>>>>> "Amerique"
>>>>>> (without diacritic) then it returns the same results.
>>>>>>
>>>>>> Has anyone set up something similar?
>>>>>>
>>>>> ISOLatin1AccentFilter
>>>>>
>>>>> -- 
>>>>> karl
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Searching- 
> Diacritics-tf4335454.html#a12354962
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Searching Diacritics

Posted by anorman <an...@mun.ca>.

I've tried to implement an analyzer with little different then using:

result = new ISOLatin1AccentFilter(result);  in the TokenStream method.

Everything appears to work, however my search will not work for any word
with diacritics with that change.  Without using it will find words such as
"cèdulas" but not "cedulas", with the change it will find neither, it just
appears to be stripping it out altogether.

Any suggestions?




anorman wrote:
> 
> Can I do this at search time rather than index time?  Below is my code
> that is handling the searching, where would I utilize such a filter?
> 
> Thanks for the help!
> 
> 
> 
> 
> package search.lucene.search;
> import org.apache.lucene.document.Document;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> 
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.analysis.ISOLatin1AccentFilter;
> import org.apache.lucene.queryParser.ParseException;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> 
> import search.lucene.index.IndexManager;
> 
> /**
>  * This class is used to search the 
>  * Lucene index and return search results
>  */
> 
> public class SearchManager {
> 
> 
> private String searchWord;
>     
>     private IndexManager indexManager;
>     
>     private Analyzer analyzer;
>     
>     public SearchManager(String searchWord){
>         this.searchWord   = searchWord;
>         this.indexManager = new IndexManager();
>         this.analyzer = new StandardAnalyzer();
>     }
>     
>     /**
>      * do search
>      */
>     public List search(){
>         List searchResult = new ArrayList();
>             	
>         IndexSearcher indexSearcher = null;
> 
>         try{
>             indexSearcher = new IndexSearcher(indexManager.getIndexDir());
>         }catch(IOException ioe){
>             ioe.printStackTrace();
>         }
> 
>         QueryParser queryParser = new QueryParser("content",analyzer);
>         Query query = null;
>         try {
>             query = queryParser.parse(searchWord);
>         } catch (ParseException e) {
>           e.printStackTrace();
>         }
>         
>         if(null != query && null != indexSearcher){			
>             try {
>                 Hits hits = indexSearcher.search(query);
>                 for(int i = 0; i < hits.length(); i ++){
> 					
> 					Document doc = hits.doc(i);
>       				System.out.println(doc.get("filename"));
>                     
> 					SearchResultBean resultBean = new SearchResultBean();
> 
>                                        
> resultBean.setXMLId(hits.doc(i).get("id"));
> 					resultBean.setXMLTitle(hits.doc(i).get("title"));
> 					resultBean.setXMLAuthor(hits.doc(i).get("author"));
> 					resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
> 					resultBean.setScore(hits.score(i));
> 					
> 					searchResult.add(resultBean);
>                 }
>             } catch (IOException e) {
>                 e.printStackTrace();
>             }
>         }
>         return searchResult;   
>         
>     }
> 
> 
> 
> 
> 
> 
> thomas arni-2 wrote:
>> 
>> You can extend the DefaultAnalyzer.
>> The only thing you have to do, is to rewrite the method tokenStream like 
>> this:
>> 
>>   /** Constructs a {@link StandardTokenizer} filtered by a {@link
>>   StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
>>   public TokenStream tokenStream(String fieldName, Reader reader) {
>>     TokenStream result = new StandardTokenizer(reader);
>>     result = new StandardFilter(result);
>>     result = new LowerCaseFilter(result);
>>     result = new StopFilter(result, stopSet);
>>     result = new ISOLatin1AccentFilter(result);
>>     return result;
>>   }
>> 
>> 
>> anorman wrote:
>>> This looks like exactly what I want.  Would I implement this along with
>>> another analyzer such as the standard or stand alone?  Does anyone have
>>> any
>>> code examples of implementing such a thing?
>>>
>>> Thanks,
>>> Albert
>>>
>>>
>>>
>>>
>>> karl wettin-3 wrote:
>>>   
>>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>>
>>>>     
>>>>> I have a searchable index of documents which contain french and  
>>>>> spanish
>>>>> diacritics (è, é, À) etc.  I would like to make the content  
>>>>> searchable so
>>>>> that when a user searches for a word such as "Amèrique" or "Amerique"
>>>>> (without diacritic) then it returns the same results.
>>>>>
>>>>> Has anyone set up something similar?
>>>>>       
>>>> ISOLatin1AccentFilter
>>>>
>>>> -- 
>>>> karl
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>     
>>>
>>>   
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12354962
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Searching Diacritics

Posted by anorman <an...@mun.ca>.

Can I do this at search time rather than index time?  Below is my code that
is handling the searching, where would I utilize such a filter?

Thanks for the help!




package search.lucene.search;
import org.apache.lucene.document.Document;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.ISOLatin1AccentFilter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

import search.lucene.index.IndexManager;

/**
 * This class is used to search the 
 * Lucene index and return search results
 */

public class SearchManager {


private String searchWord;
    
    private IndexManager indexManager;
    
    private Analyzer analyzer;
    
    public SearchManager(String searchWord){
        this.searchWord   = searchWord;
        this.indexManager = new IndexManager();
        this.analyzer = new StandardAnalyzer();
    }
    
    /**
     * do search
     */
    public List search(){
        List searchResult = new ArrayList();
            	
        IndexSearcher indexSearcher = null;

        try{
            indexSearcher = new IndexSearcher(indexManager.getIndexDir());
        }catch(IOException ioe){
            ioe.printStackTrace();
        }

        QueryParser queryParser = new QueryParser("content",analyzer);
        Query query = null;
        try {
            query = queryParser.parse(searchWord);
        } catch (ParseException e) {
          e.printStackTrace();
        }
        
        if(null != query && null != indexSearcher){			
            try {
                Hits hits = indexSearcher.search(query);
                for(int i = 0; i < hits.length(); i ++){
					
					Document doc = hits.doc(i);
      				System.out.println(doc.get("filename"));
                    
					SearchResultBean resultBean = new SearchResultBean();

                                       
resultBean.setXMLId(hits.doc(i).get("id"));
					resultBean.setXMLTitle(hits.doc(i).get("title"));
					resultBean.setXMLAuthor(hits.doc(i).get("author"));
					resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
					resultBean.setScore(hits.score(i));
					
					searchResult.add(resultBean);
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return searchResult;   
        
    }






thomas arni-2 wrote:
> 
> You can extend the DefaultAnalyzer.
> The only thing you have to do, is to rewrite the method tokenStream like 
> this:
> 
>   /** Constructs a {@link StandardTokenizer} filtered by a {@link
>   StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>     TokenStream result = new StandardTokenizer(reader);
>     result = new StandardFilter(result);
>     result = new LowerCaseFilter(result);
>     result = new StopFilter(result, stopSet);
>     result = new ISOLatin1AccentFilter(result);
>     return result;
>   }
> 
> 
> anorman wrote:
>> This looks like exactly what I want.  Would I implement this along with
>> another analyzer such as the standard or stand alone?  Does anyone have
>> any
>> code examples of implementing such a thing?
>>
>> Thanks,
>> Albert
>>
>>
>>
>>
>> karl wettin-3 wrote:
>>   
>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>
>>>     
>>>> I have a searchable index of documents which contain french and  
>>>> spanish
>>>> diacritics (è, é, À) etc.  I would like to make the content  
>>>> searchable so
>>>> that when a user searches for a word such as "Amèrique" or "Amerique"
>>>> (without diacritic) then it returns the same results.
>>>>
>>>> Has anyone set up something similar?
>>>>       
>>> ISOLatin1AccentFilter
>>>
>>> -- 
>>> karl
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12353022
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Searching Diacritics

Posted by thomas arni <ar...@zhaw.ch>.

You can extend the DefaultAnalyzer.
The only thing you have to do, is to rewrite the method tokenStream like 
this:

  /** Constructs a {@link StandardTokenizer} filtered by a {@link
  StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    result = new ISOLatin1AccentFilter(result);
    return result;
  }


anorman wrote:
> This looks like exactly what I want.  Would I implement this along with
> another analyzer such as the standard or stand alone?  Does anyone have any
> code examples of implementing such a thing?
>
> Thanks,
> Albert
>
>
>
>
> karl wettin-3 wrote:
>   
>> 27 aug 2007 kl. 16.03 skrev anorman:
>>
>>     
>>> I have a searchable index of documents which contain french and  
>>> spanish
>>> diacritics (è, é, À) etc.  I would like to make the content  
>>> searchable so
>>> that when a user searches for a word such as "Amèrique" or "Amerique"
>>> (without diacritic) then it returns the same results.
>>>
>>> Has anyone set up something similar?
>>>       
>> ISOLatin1AccentFilter
>>
>> -- 
>> karl
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Searching Diacritics

Posted by anorman <an...@mun.ca>.

This looks like exactly what I want.  Would I implement this along with
another analyzer such as the standard or stand alone?  Does anyone have any
code examples of implementing such a thing?

Thanks,
Albert




karl wettin-3 wrote:
> 
> 
> 27 aug 2007 kl. 16.03 skrev anorman:
> 
>>
>> I have a searchable index of documents which contain french and  
>> spanish
>> diacritics (è, é, À) etc.  I would like to make the content  
>> searchable so
>> that when a user searches for a word such as "Amèrique" or "Amerique"
>> (without diacritic) then it returns the same results.
>>
>> Has anyone set up something similar?
> 
> ISOLatin1AccentFilter
> 
> -- 
> karl
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12347709
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org